Webinar Q&A with Dave Farley – Should your database be Continuously Delivered?

 DLM_webinar_banner.png

With Dave Farley & Elizabeth Ayer, and host, Stephanie Herr

If you want to check out our previous webinars, the recordings are available in our DLM Webinar playlist.

Webinar Summary

THE Continuous Delivery expert, Dave Farley, was grilled by Elizabeth Ayer, Redgate Product Manager about Databases and Continuous Delivery. Dave is co-author of the Continuous Delivery book with Jez Humble, which has made a huge difference in software:

The webinar recording is available at https://www.youtube.com/watch?v=PgOHkVXM-yM. Dave explained what Continuous Delivery is, “having an idea and getting it into the hands of users as efficiently as we can.” It is not Continuous Deployment where every single change get pushed through to Production environments. It’s the state of being ready to release at any time. Some companies do this multiple times a day (eg, Amazon deploys once every 11 seconds) and others release once every couple of weeks. It really depends on the business need.

A deployment pipeline is your process of releasing software. Every change is a release candidate. Continuous Delivery makes the process repeatable, reliable, and efficient thanks to automation.

Continuous Delivery sounds easy, but it’s a massive culture change. It doesn’t just impact the development teams. It relates to testing, operations, the business, and more. It allows you to experiment. Dave was very positive about making small incremental changes on your journey towards Continuous Delivery. Just getting things in version control is a great first step and will bring benefits.

Now, to the main question, what about the database? Dave said the database is not special. I disagree. It is special. It holds your all-important data. There are permission issues and data sensitivity issues and you can’t just overwrite what is out there. Dave went on to explain that it’s not special in the sense that it should still be highly tested and automated. (Ok, I agree with that.) He recommended the book Refactoring Databases by Scott Ambler and Pramod Sadalage to help with developing database changes in a small, incremental way. Dave has used delta scripts in his past experiences that has been very successful and makes sure every transition is applied to the database by keeping a version table in the database and then applying the scripts you need to update it to the version you want. He also uses a light weight set of data to test that the transitions do what he expects. (You can read more about database unit testing vs migration testing in a write-up by Jonathan Hickford, another Redgate Product Manager, from when he met Dave a few months earlier.) Dave also recognized the need to think about architecting the database as a microservice to help with small changes.

Here are some questions that attendees asked, but we didn’t have time to cover during the webinar:

How can we overcome the “we’ve always done it that way” group-think mentality?

DaveFarley.gifDave: For me the question is, “is your process working now as well as you want it to?” if not I think you should try something else.

I believe that we have found a better way to deliver valuable, high-quality, software to the organizations that employ us. The trouble is that it is a very different way of working. Mostly people are very wary of change, particularly in software development, where we have promised a lot before and not delivered.

The only way I know to move a “group-think” position is gradually. You need to make a positive difference and win trust. It is about looking at real problems and solving them, often one at a time.

I believe that we, the software industry, are in a better place than we were, because we finally have the experience to know what works and what does not. The trick now is to migrate to the approaches that work. This takes learning, because the new approaches are very different to the old, it challenges old assumptions. It is helpful to get some guidance, hire people that have some experience of this new way of working, read the literature, and carry out small, controlled experiments in areas of your process and business that will make a difference.

I often recommend to my clients that they perform a “Value-stream analysis” to figure out where they are efficient at software delivery and where they are not. This is often an enlightening exercise, allowing them to easily spot points that can be improved. Sometimes this is technology, more often it is about getting the right people to communicate effectively.

Once you have improved this problem, you will have improved the situation and gained a little “capital” in the form of trust that will allow you to challenge other sacred-cows. This is a slow process, but for a pre-existing organization it is the only way that I know.

What advice would you have for gaining management buy-in for continuous delivery?

DaveFarley.gifDave: Continuous Delivery is well-aligned with management ambitions. We optimize to delivery new ideas, in the form of working software, to our users as quickly and efficiently as possible. The data from companies that have adopted CD is compelling, it improves their efficiency and their bottom-line. Many of the most effective software companies in the world employ CD.

The problem is not really the ideals, it is the practice, what it takes to get there. CD organizations look different to others. They tend to have many small teams instead of fewer large ones. Each team has a very high degree of autonomy, many don’t really have “management” in the traditional sense. So this can be very challenging to more traditional organizations.

The good news is that the way to adopt CD is by incremental steps. Each of these steps is valuable in it’s own right, and so each can be seen as a positive step. If you don’t use version control – start. If you don’t work iteratively, and regularly reflect on the outcomes of your work so that you can correct and improve – start that. If you don’t employ test automation, or deployment automation or effective configuration management start those things too. Each of these steps will bring a different benefit, over time they reinforce one-another so you get more than the sum of the parts.

There are several CD maturity models, there is one in the back of my book, which can offer guidance on what to try next here is another that I have used: http://www.infoq.com/articles/Continuous-Delivery-Maturity-Model

We are very early in the stages of DB CD process changes, what are the most important issues to tackle early?

DaveFarley.gifDave: That is quite a tough question to answer without the consultants lament “It depends” 😉

I think that the fundamental idea that underpins CD is to take an experimental approach to everything, technology, process, organization the lot. Try new things in small controlled steps so that if things go wrong you can learn from it rather than regret it.

At a more technical level, I think that version controlling pretty much everything, automated testing and continuous integration are corner-stones. If you are starting from scratch, it is much easier to start well with automated testing and continuous integration than to add these later. It is not impossible to add them later, it is just more difficult.

So be very strict with yourselves at first and try working so that you don’t make ANY change to a production system without some form of test. This will feel very hard at first if you are new to this, but it really is possible.

Are there any best practices you’d especially recommend we bear in mind?

DaveFarley.gifDave: There is a lot to CD. I tend to take a very broad view of its scope and so it encompasses much of software development. At that level the best practices are grounded in Lean and Agile principles. Small, self-directed teams, working to very high quality standards, employing high levels of automation for tests and deployment are foundational.

At the technical level there are lots at all different levels of granularity. I guess the key idea from my book is the idea of the “Deployment Pipeline” this is the idea of automating the route to production. A good mental model for this is to imagine that every change that is destined for production gives birth to a release-candidate. The job of the deployment pipeline is to prove that a release candidate is NOT FIT to make it into production. If a test fails, we throw the release candidate away.

What are some common practical issues that people encounter during the implementation of CD?

DaveFarley.gifDave: I have covered some of this in the preceding answers. Most of the problems are people problems. It is hard to break old habits. At the technical end, the most common problems that I have seen have been very slow, inefficient builds, poor, or non-existent, automated deployment systems and poor, or non-existent, automated tests.

What would be the fastest way to actually perform CD?

DaveFarley.gifDave: The simplest way to start is to to start from scratch, with a blank sheet. It is easier to start a new project or new company this way than to migrate an existing one.

I think it helps to get help from people that have done this before. Hire people with these skills and learn from them.

We deal with HIPPA regulated data and I am personally unsure of letting this data out. How does CD typically get implemented in highly regulated environments? Are there particular challenges?

DaveFarley.gifDave: The only challenge that I perceive is that regulators are often unfamiliar with the ideas and so their assumptions of what good regulatory conformance looks like is tailored with, what to me, looks like an outdated assumption of development practice.

My experience of working in heavily regulated industries, mostly finance in different countries, is that the regulators quickly appreciate this stuff and they *love* it.

CD gives almost ideal traceability, because of tour very rigorous approach to version control and the high-levels of automation that we employ we get FULL traceability of every change, almost as a side-effect. In the organizations where I have worked in the finance industry, we have been used as bench-marks for what good regulatory compliance looks like.

So the challenge is educating your regulators, once they get it they will love it.

How should a data warehouse deal with a source database which is in a CD pipeline?

DaveFarley.gifDave: As usual, it depends. The simplest approach is to treat it like any other part of the system and write tests to assert that changes work. Run these tests as part of your deployment pipeline.

If not you need to take a more distributed, micro-service style approach. In this approach try and minimize the coupling between the Data Warehouse and the up-stream data sources. Provide well-defined, general, interfaces to import data and make sure that these are well tested.

How do you recommend we use CD to synchronize, deploy and verify complex projects with database, Agent Jobs, SSIS packages and SSRS reports.

DaveFarley.gifDave: I would automate the integration of new packages as part of my deployment pipeline. I would also look to create automated tests that verify each change, and run these as part of my pipeline.

How would you deal multiple versions of a database (e.g. development , internal final test, and a version for customer), and do you have any advice for the automatic build and deploy of a database?

DaveFarley.gifDave: I recommend the use of the ideas in “Refacoring Databases” by Scott Ambler and Pramod Sadalage.

Do you have any tips for enabling rapid DB ‘resets’ during build/test? E.g. How to reset DB to known state before each test?

DaveFarley.gifDave: A lot depends on the nature of the tests. For some kinds of test, low-level unit-like tests it can be good to use the transactional scope for the duration of the test. At the start of the test, open a transaction, do what you need for the test, including any assertions, at the end of the test abort the transaction.

For higher-level tests I like the use of functional isolation. Where you use the natural functional semantics of your application to isolate one test from another. If you are testing Amazon, every test starts by creating a user account and a book. If you are testing eBay every test starts by creating a user account and an auction….

You can see me describing this in more detail in this presentation – I am speaking more generally about testing strategies and not specifically about the DB, but I think that the approach is still valid. https://vimeo.com/channels/pipelineconf/123639468

I’m concerned about big table rebuilds not being spotted until upgrade night.  Also obscure feature support like FILESTREAM. Do you have any tips for avoiding these kinds of last-minute surprises or dealing with a wide mix of systems?

DaveFarley.gifDave: I tend to try to treat all changes the same. I don’t like suprises either, so I try to find a way to evaluate every change before it is released into production. So I would try to find a way to automate a test that would highlight my concerns and I would run this test in an environment that was sufficiently close to my production environment to catch most failures that I would see there.

Do you have any advice for achieving zero down time upgrades and non breaking on-line database changes?

DaveFarley.gifDave: I have seen two strategies work. They are not really exclusive of one another.

1) The microservice approach, keep database scoped to single applications and create software that is tolerant of a service not being available for a while. I have done some work on an architectural style called “Reactive Systems” which promotes such an approach.

2) Work in a way that every change to your database is additive. Never delete anything, only add new things, including schema changes and transactional data. So ban the use of UPDATE and DELETE 😉

How do you craft data repair scripts that flow through various development environments?

DaveFarley.gifDave: I generally encode any changes to my database as a delta. Deployment starts from a baseline database image and from then on changes are added as deltas. Each copy of my database includes a table which records which delta version it is at. My automated deployment scripts will interrogate the DB to see which version it is at. It will look at the deltas to see which is the newest and it will apply all of the deltas between those two numbers. This approach is described in more detail in Pramod and Scott’s book.

I think of the delta table as describing a “patch-level” for my DB. So two DBs at the same “patch-level” will be structurally identical, though they may contain different transactional data.

What are some of the community-supported open source CD applications that would work well for an enterprise org that currently doesn’t have CD?

DaveFarley.gifDave: If you are going to take CD seriously you are going to want to create a pipeline and so coordinate different levels of testing for a given release candidate. So build management systems are a good starting point, Jenkins, TeamCity and Go from ThoughtWorks are effective tools in this area.

I think that the tools for automated testing of DBs are still relatively immature, most places that I have seen use the testing frameworks from application programming languages and grow their own tools and techniques from there.

Redgate has tools for versioning DBs. I haven’t used them myself, but they have a good reputation. My own experience is that up to now I have used conventional version control systems, Subversion or GIT, and stored scripts and code for my DB there.

Sponsor

This webinar was sponsored by Ready Roll, which provides a better way to develop databases in Visual Studio. It supports the delta approach that Dave discussed and gives you full control over your database deployments.

Wrap up

A huge thanks to Dave for spending time with us and sharing some of his experience about Continuous Delivery. If you think of any more questions, please comment below. You can learn more about DLM and sign up for future webinars at http://www.red-gate.com/products/dlm.