Automated Deployments: The Tester's Tale

Why should an automated deployment process be such an advantage? Once a process is scripted reliably, it can be repeated without human error, and is therefore testable. It also means that it is less tiresome, and is therefore done more often. Problems surface quicker and are caught more easily. That's not all, as Ben the Tester explains.

This article is about what happened when the team that was responsible for deploying our intranet-based internal systems moved from a manual process to an automated one, and about the lessons we learned along the way

I am a test engineer working as part of Red Gate’s DevOps team. My main task is to deploy pretty-well all the intranet-based systems that we use to run the business. The majority of the applications we deploy are .NET-based. Obviously, with a traditional .NET stack we use SQL Server at the database layer.

During my time here doing deployments, our team has progressed from the bad old days where the testers in one of the development teams would chuck things over the wall to the Ops team to deploy, to being able to do almost instantaneous deployments, using one deployment mechanism whether the target is a test server, staging server or production.

The Bad Old Days of Manual Deployment

Once the testers had completed their testing cycle, they would decide which build to push out and would then throw a package over the wall to Operations in a zip file. This file generally contained the compiled web application, and a set of thoroughly-detailed manual steps. If the project required any changes to the database, then the package would also contain database change scripts. For anything more than a minor change, it would also include a rollback script in case Ops needed to bail-out a failed deployment.

We did on average, one deployment a week, maybe two, depending on how projects came together at the end.

The two teams worked independently. When a deployment was due we’d arrange a time first thing in the morning when one person from each team would be available to do the deployment. Why first thing in the morning? Doing it first thing meant that we had the best chance of the right people still being in the office if it went wrong. If something didn’t exactly go as planned, a member of the Dev team needed to be there to recognize what was missing. Because it was a series of manual steps, there was always the risk of accidentally missing something out or not getting a particular step quite correct because of a difference between a test environment and the production environment. Ops are the only ones with access to the servers that we need to make changes to – so when doing a deployment, we needed one of them to be around so we could get access to make the changes, and one of us who knew exactly what we wanted to change. An early start was a precaution because, for external deployments, the overwhelming majority of our customers (the US) wouldn’t be online for several hours.

If we were making a change to a database that we couldn’t easily recover from, such as modifying tables which changed data, we used to lose a lot of time. In those situations, in order to give ourselves a rollback plan, we used to take a database backup. Within the architecture we have, we have two databases that pretty much everything we maintain uses, and so we used to have to be very careful.

Once you’ve done the deployment you’ve got to do some kind of validation after the fact to make sure everything’s okay, and monitor the data to make sure BI reports and things haven’t gone wrong. And of course you’d expect to spend at least as much time on preparing for the deployment in advance, writing the scripts of manual steps, we reused the same template over and over again, but even so you still have to make sure everything’s correct.

I used to write the deployment script, and then do it in test, and make sure that I’d not missed something out – and so you’re talking about another hour. All in all we’re talking about several hours to do a deployment when we were looking to ship.

How Did we Improve the Process?

The reason that we wanted to automate deployment was not only to reduce the pain of deploying to production, but also to make all the deployments repeatable across all our environments, which is very important. It’s obviously not a good idea to follow one process when you’re deploying to test, and then follow a completely separate one when deploying to production. After all, we wanted to check the deployment mechanism during test.

We therefore decided that we needed a single automated process that did deployment, whether it was to test, staging or production. Testing is just as much for making sure your application works as it is for testing that your Deployment plan works, so it was essential to make sure we are deploying the same way everywhere. Because we were using tools to drive our deployments, we could then automate the process. So as well as automated deployment, we can automate the deployment tool. Now we’re automatically deploying every single new build from TeamCity into our test environment. Previously, I had to log on to the test environment, browse to the network share where the build zips would land, copy it across, unzip it, and manually edit the files. Now, I just see that there’s been a build, I browse the test machine and see that it’s already there.

The Benefits of an Automated Deployment

I cannot remember the last time I did a manual deployment but it must have been February. Since introducing automated deployment we’ve extended the approach to pretty much every project that’s seen active development since that time. Once you’ve automated the deployment process, nobody ever wants to go back to doing it by hand, it’s just such a time saver.

Whereas before we were doing maybe two deployments a week, we probably now do around ten deployments a week to production. That’s still an average, sometimes it’s five, other weeks it’s twenty. The time a deployment takes is no longer an issue. Whereas a manual physical deployment used to take 30-45 minutes, now it’s just how long it takes for the package to be delivered over the network, usually sized around 30-40mb from the NuGet server to the production server. From the time you press the button to the time it says its complete is probably about 30 or 40 seconds.

Apart from the time we save, the main benefit is repeatability, being able to deploy the same way everywhere – and also the follow-up benefit that by saving time you can deploy that much more often. You can avoid that sort of tedious repetition that no one wants to have to go through. Also, while we want all the members of our team to have a working knowledge of how our applications are hosted and deployed, you don’t actually need that. So if there’s a new starter coming into the team, you can work on a project and ship it at the end without having to get over that sometimes very steep learning curve.

This process allows us to have very frequent, small deployments, so there is less chance of a major problem with each deployment. Smaller deployments also mean we don’t have to take a full backup every time we deploy. Back in the day when we’d work for months on one release, there was huge risk every time we deployed. Now the fear is not that we’ll break something, it’s “Hey, we haven’t deployed this for two weeks. Other changes have been made that should probably be deployed now”. And then someone just goes to press the button.

How we do an Automated Deployment

At the moment our team uses Subversion for our version control system. The vast majority of our development is done on trunk, so that’s the branch that all of our continuous integration process is triggered by. So TeamCity notices there’s a commit, it build the application and runs the unit tests; assuming both those stages are successful, it then packages the application into a NuGet package, publishes that to our internal NuGet server, and our last step, we remotely call the Deployment Manager API and tell it to deploy the new build to a named test machine. That’s the entire workflow throughout development, we iterate and iterate until there’s a whole series of commits happening and at some point we say okay, that’s a build that we want to go out. So we go into Deployment Manager in order to find that particular build and say, this has gone to test and can you please deploy it to production.

During a deployment step, you can tell Deployment Manager to transform the configuration files based on the target of the deployment. An example of this might be to ensure that a connection string contains the appropriate credentials to log in to your production database.

In the bad old days of having a manual process, that step had to be done by ‘You’. By that, I mean the person clicking around on the machine during the deployment for half an hour in the morning. This person would have to go and find the previous version, dig in to the relevant config files, and copy and paste the stuff from there to the new ones. Whereas in the new world, we have a transform file alongside our config file which says in effect, ‘when you deploy to live, or to test or staging, etc, you need to change this file in this way to make sure it works. So that manual intervention is no longer necessary, and once you got it right the first time, unless someone changes the user accounts, which is always possible, that’s now a solved problem. These transform files form part of the build output, which is then turned into a NuGet package, which is consumed by Deployment Manager. And when Deployment Manager downloads the package, and extracts it onto the machine it’s going to be deploying it on, it walks the directory structure looking to see if it can find any of these transform files, and if it does it applies them.

Since we’ve moved from big bang releases to small incremental releases, the number of deployments that require making changes to the database is pretty small. Where a change is necessary, we still use SQL Compare to generate both a deployment script and a rollback script. These are then passed to our DBA to be run on production. We always used to try and ensure that these were backwards compatible, but the main thing we do differently now is that we try to get the SQL changes deployed before we deploy the application changes. That way there are fewer things changing at the same time, and it’s easier to isolate breakages.

In the future, we’d like to get to the point where we’re making all of our changes via Deployment Manager. So we’re pretty excited to see the team adding SQL support to the product, and are hoping to try it out soon.

All in all, it is getting to be just as if somebody just presses a button.