Where’s the Ops in DevOps? Part 3

In this three-part series, guest bloggers from DevOpsGroup look at the real role of Ops in DevOps. Where it changes, how it changes, and why Ops has an important part to play in the brave new world of DevOps.

The top ten DevOps operational requirements

One of the key tenets in DevOps is to involve the Operations team in the full software development lifecycle and, in particular, to ensure that operational requirements are incorporated into the design and build phases.

To make your life easier, we’ve scoured the internet to compile this list of the top ten DevOps operational requirements (okay, it was really just chatting with some of the guys down the pub, but we’ve been doing this a long time and we’re pretty sure that if you deliver on these, your Ops people will be very happy indeed).

#10 – Instrumentation

Would you drive a car with a blacked out windscreen and no speedo? No, didn’t think so, but often Operations are expected to run applications in Production in pretty much the same fashion.

Instrumenting your application with metrics and performance counters gives the Operations people a way to know what’s happening before the application drives off a cliff. Some basic counters include things like transactions-per-second (useful for capacity), and transaction time (useful for performance).

#9 – Keep track of dependencies

“Oh yeah, I forgot to mention that it needs [dependency XYZ] installed first.”

“Yes, the system relies on [some third party web service] – can you just open up firewall port 666 right away?”

We all understand that modern web apps rely on lots of third party controls and web services – why re-invent the wheel if someone’s already done it? But please keep track of the dependencies and make sure they’re clearly documented (and ideally checked into source control along with your code where possible).

Nothing derails live deployments like some dependency that wasn’t documented and has to be installed/configured/whatever at the last moment. It’s a recipe for disaster.

#8 – Code defensively and degrade gracefully

Related to #9 above – don’t always assume that dependencies are present, particularly when dealing with network resources like databases or web services, and even more so in Cloud environments where entire servers are known to vanish in the blink of an Amazon’s eye.

Make sure the system copes with missing dependences, logs the error and degrades gracefully should the situation arise.

#7 – Backward/forward compatibility

Existing code base with new database schema or stored procedure? New code base with existing database schema or stored procedures?

Either way, forwards or backwards, it should work just fine because if it doesn’t you introduce ‘chicken and the egg’ dependencies. What this means for Operations is that we have to take one part of the system offline in order to upgrade the other part … and that can mean an impact on our customers and probably reams of paperwork to get it all approved.

#6 – Configurability

I once worked on a system where the database connection string was stored in a compiled resource DLL. Every time we wanted to make a change to that connection string we had to get a developer to compile the DLL and then we had to deploy it … as opposed to simply just editing a text configuration file and re-starting the service. It was, quite frankly, a PITA.

Where possible, avoid hard-coding values into the code; they should be in external configuration files that you load (and cache) at system initialisation. This is particularly important as we move the application between environments (Dev, Test, Staging, etc) and need to configure the application for each environment.

That said, I’ve seen systems that had literally thousands of configuration options and settings, most of which weren’t documented and certainly were rarely, if ever, changed. An overly configurable system can also create a support nightmare as tracking down which one of those settings has been misconfigured can be extremely painful.

#5 – Feature flags

A special case of configurability that deserves its own rule – feature flags. We freakin’ love feature flags.

Why? Because they give us a lot of control over how the application works that we can use to (1) easily back out something that isn’t working without having to roll-back the entire code base, and (2) help control performance and scalability.

#4 – Horizontal scalability (for all tiers)

We all want the product to be a success with customers, but we don’t want to waste money by over-provisioning the infrastructure upfront. We also want to be able to scale up/down if we have a spiky traffic profile.

For that, we need the application to support horizontal scalability and, for that, we need to think about this when designing the application. There are three good examples:

  1. Don’t tie user/session states to a particular web/application server (use a shared session state mechanism).
  2. Provide support for read-only replicas of the database (eg, a separate connection string for ‘read’ versus ‘write’).
  3. Offer support for multi-master or peer-to-peer replication (to avoid a bottleneck on a single ‘master’ server if the application is likely to scale beyond a reasonable server specification). Think very carefully about how the data could be partitioned across servers, and the use of IDENTITY/@Auto_Increment columns, etc.

#3 –Automation and scriptability

One of the key tenets in the CALMS DevOps model is A for Automation (Culture-Automation-Lean-Metrics-Sharing, if you want to know the others).

We want to automate the release process as much as possible, for example by packaging the application into versionable released or the ‘infrastructure-as-code’ approach, using tools like Puppet and Chef for the underlying hardware.

But this means that things need to be scriptable.

I can remember being reduced to using keystroke macros to automate the (GUI) installer of a third party dependency that didn’t have any support for silent/unattended installation. It was a painful experience and a fragile solution.

When designing the solution (and choosing your dependencies), constantly ask yourself the question: “Can these easily be automated for installation and configuration?” Bonus points if they can. In very large scale environments (1,000+ servers), build in auto-discovery mechanisms where servers automatically get assigned roles, service auto-discovery (eg, http://curator.apache.org/curator-x-discovery/index.htm), etc.

#2 – Robust regression test suite

Another thing we love, almost as much as feature flags, is a decent set of regression test scripts that we can run on demand to help check/verify/validate everything is running correctly in Production.

We understand that maintaining automated test scripts can be onerous and painful, but automated testing is vital to an automation strategy. We need to be able to verify that an application has been deployed correctly, either as part of a software release or scaling out onto new servers, in a way that doesn’t involve laborious manual testing. Manual testing doesn’t scale.

The ideal test suite will exercise all the key parts of the application and provide helpful diagnostic messaging if something isn’t working correctly. We can combine this with the instrumentation (remember #10 above), synthetic monitoring, Application Performance Management (APM) tools (eg, AppDynamics), infrastructure monitoring (eg, SolarWinds), etc, to create a comprehensive alerting and monitoring suite for the whole system. The goal is to ensure that we know something is wrong before the customer.

#1 – Documentation

Contrary to popular belief we (Operations people) are quite happy to RTFM. All we ask is that you WTFM (that’s W as in WRITE!).

Ideally we’d collaborate on the product-centric documentation using a Wiki platform like Atlassian Confluence as we think this gives everyone the easiest and best way to create – and maintain – documentation that’s relevant to everyone. As a minimum we want to see:

  1. A high-level overview of the system (the big picture), probably in a diagram
  2. Details on every dependency
  3. Details on every error message
  4. Details on every configuration option/switch/flag/key, etc
  5. Instrumentation hooks, expected values
  6. Assumptions, default values, etc

Summary

Hopefully, this top ten list will give you a place to start when thinking about your DevOps operational requirements, but it’s by no means comprehensive or exhaustive. We’d love to get your thoughts on what you think the key requirements are for your applications.

You might also like to read the first post in this series, DevOps does not equal ‘Developer managing Production’ and the second post, What does the future of IT Operations look like in a DevOps World?.

Are you looking to get started on your DevOps journey?
Call DevOpsGroup on 0800 368 7378 or email the team

 

Discover more about applying DevOps processes to the database on the Redgate solutions page