There were so many questions asked at a recent presentation that I just didn’t have time to answer them all. However, many of them were really interesting, so I promised to address them over time as blog posts. The next question is:
I manage an ETL team that supports highly customized code for all our processes. What is the best approach for implementing a DevOps process for a team that is separate from applications but still has a need for this type of process? Where do I start?
When Your Team Is Not Integrated
I’m not going to sugar coat it. This challenge is difficult.
Automation by a team that is downstream of other teams and other processes isn’t going to be easy. However, just because something isn’t easy doesn’t mean it won’t bring benefits. In fact, sometimes the tougher challenges are the better ones to solve.
In this case, the hard part is actually not figuring out how to automate your own work. That part of the process should actually be pretty simple. You have a database (or databases). That database has objects. Those objects can be expressed as code. Put your code into source control. Use a tool like Redgate Automation or Flyway to create a mechanism for deployment automation through whatever flow control software you want to use. You’re off and running.
No, the challenge is going to be in coordinating your releases with the other teams. If your situation is like any of the other data warehouses, reporting systems, analysis and the rest that I’ve worked with, you can be seen as the end of the tail. It’s even possible that stuff will change before you’re ready, leading to errors in your ETL processes. I’ve seen this pretty frequently and it almost always arises from a lack of communication.
So, that should be your focus. Ensuring that you’ve got the comms with the other teams to be able to see their changes coming and incorporate that into your release and testing automation. If you can’t simply communicate with these teams to know what changes are coming, then, you may need to put some kind of auditing in place that lets you see what changes have been made to the systems whose data you’re actively consuming. Assuming you can’t do that either, then I’d suggest focusing exclusively in two areas.
First, make darned sure your ETL processes are very robust, with good error handling and recovery. If you’re just going to more or less get busted by changes outside your control, focus on what you can control. You can control how your code behaves, especially how it deals with errors.
Second, get your deployment automation and DevOps-style process in place. If you’re going to be getting random changes, your ability to respond is enhanced through the automation that a DevOps-style approach brings.
If you can communicate your way to a better integration with the other systems, that’s your best bet. You’d still want to automate your deployments, testing, and all the rest. However, if you can’t get the communication issues fixed, then, you still should automate. However, your focus will be around automation in a reactive stance as opposed to a proactive one. That’s less than ideal, but, it sure will help you deal with the situation better.
Was this article helpful?