I’ve been frequently asked by people interested in DevOps: “How do I convince management that this whole automation thing is worth our time?”
It’s a great question. What I always say is, start by documenting your pain. However, just saying that statement may not be clear enough for everyone, so let me explain further.
One of the simplest things to start documenting is whether or not you’re experiencing downtime because of your existing deployment processes. This is clearly one of the most worrisome pain points that the business could have to deal with. I’ve worked with organizations in the past where any kind of downtime resulted in the loss of serious money. I’ve got friends who worked within healthcare where downtime could literally result in deaths. Downtime is a very serious problem.
Downtime from deployments comes in several flavors. One possible situation is that the deployment itself causes the downtime. When you’re running scripts for the very first time on your production system, the chances of creating downtime are very high. Even if you’re practicing the scripts before you deploy in your non-production environment, you may still see errors in production. This is especially true if you haven’t used some kind of “shift-left” mechanism to ensure that your non-production environment is as near to production as you can make.
The second kind of downtime is caused due to the manual nature of your deployments. We used to keep a checklist on a whiteboard. We’d carefully walk through the checklist, ensuring that everything was done. However, every so often, we missed a step. One time, the boss came by to check in with us, leaned against the board, and wiped out the checklist, causing more delays and rework on the deployment.
Another kind of downtime that you have to take into account is the amount of time it takes to deploy. Before I started automating deployments, we used to do it all manually. Step one was to bleed off all connections, run a backup, then deploy. Every step took a long time. As noted above, errors on any of the steps would certainly impact downtime; however, the process itself caused our servers to be offline.
All this together negatively impacts the business, so this is the first pain point I would document.
It’s entirely possible that you can deploy to your databases and never have an error, never experience data loss of any kind. If so, you’re very lucky. However, in a manual environment with minimal, or no, automated testing, it’s very easy for an error to slip in. Without that shift-left process, you may be running a script for the very first time on your production server. Also, if you’re not deploying in a Continuous Deployment methodology using lots of fast, small deployments, that script is likely to be 15,000 lines or more. I’m sure you looked through it, but you didn’t really read every single line of code.
Without automated, frequent, early, testing, the chances of causing data loss are radically heightened. This quickly becomes a case of testing your ability to restore your database to a point in time. Regardless of how well you do this, you’re introducing more downtime for the business. Even worse, you may have to go and explain to the company just how much data was lost, irretrievably. Do you have automated testing for your backups? If not, they could be corrupted or missing, and you won’t even get the chance to retrieve the data you lost.
I’ve heard the argument made that, well, you’re paid anyway, so whatever time you spend building deployments manually is all part of the job and effectively free. However, that’s just not true. There are so many hours in a day, and nothing any of us does can change that. How you spend your day matters. You need to document how you spend your day, especially as it relates to deployments.
Like uptime, there are different pain points where the time you spend becomes an issue. The first, and easiest to define, is the time it takes you to build out the deployment script. If you’re not working out of source control and using branches or tags to identify which pieces of functionality are ready to move to production, then a considerable part of your day could be spent in figuring which parts of the database are OK to deploy and which parts are still under development. Not only can these manual processes chew up your day, but they’re also another vector for introducing error into your deployments. Moving to a mechanism whereby deployment scripts are created through automated processes can help to eliminate this pain.
Another big chunk of your time is going to be spent on environment refreshes as well as manual deployments. The QA team would like a new copy of production. If you have yet to automate the refresh process, you must complete many manual steps to for this request: get a backup from production, restore it to QA, and deploy the code after building it manually. You must also run some scripts to clean the data before handing it over to QA. (Please tell me you do this, even if you’re not subject to a compliance regime like the GDPR or the CCPA; you shouldn’t have production email addresses in non-production environments and that’s just one example.) Multiply this effort across multiple development streams, multiple teams and multiple environments, and suddenly you’re spending a lot of your time dealing with deployments.
The most important thing to remember is all the time you document that you spend on deployments is time you could be working on automating those same deployments, tuning queries, or, most importantly, helping build out new functionality for the organization.
Velocity of Change
Silly terms like “at the speed of business” can be bandied about, but the fact of the matter is, in a rapidly changing world, your organization needs to change. This means code has to change, and your database has to change. One more piece of pain you have to document is just how long it takes to get a change out the door successfully. Please note, the keyword in that sentence is “successfully”. We can rapidly deploy the wrong thing or something that doesn’t work. It takes a little more time to ensure that things go successfully.
You know when new functionality requests come in the door. Start the clock to understand just how long it takes to deliver them to production. Document how much of that time is spent deploying the database, resetting the database, building the deployment process, or dealing with the problems caused by deployments. This collective time is a cost added to everything your organization does, making the entire thing less efficient. If nothing else I’ve suggested above will sway people, this measure alone may.
If it looks like I’m equating time and pain, I am. There’s a classic saying:
You can have something quick, cheap or good. Pick two
To a degree, this is true. However, if you’re willing to spend some time, because time is always the hard part, to automate your processes, adopt a DevOps mentality and methodology, you can arrive at the place where you’re delivering good functionality quickly. You just have to make the investment to arrive at that spot. To begin to deliver this message, show how you’re only satisfying one of the three, not two. You’re not delivering things either quickly nor cheaply because of all the time and effort. Further, you may be missing out on all three if you’re also experiencing a lot of downtime and recovery.
While there are many benefits to the adoption of DevOps, the ability to reclaim time so that you can spend it where it should be spent, building systems and delivering value, is arguably the most important.