Regular Rapid Releases: An Agile Tale

While developing their new SQL Source Control tool, a team at Red Gate learned a lot about Agile development, as well as the benefits (and challenges) of rapid, regular Early Access releases. Stephanie Herr (project manager for SQL Source Control) and Mark Wightman (head of development at Red Gate Software) tell us more.

SQL Source Control is an add-in for SQL Server Management Studio (SSMS) that allows database professionals to source control their databases in their existing source control systems, which gives them the ability to track who changed what, when, and why. In this article, we’ll talk about how we created Version 1.0 of this tool using Scrum and Agile techniques here at Red Gate. We’ll describe how incremental development allowed us to get extensive user feedback, by deploying quick releases to our users. We achieved this by focusing on our product backlog, and by automating our testing – all of which we’ll discuss. Finally, we’ll take a look at our story estimations, which helped us make better decisions across the project.

Hopefully you’re already familiar with what source control is. What SQL Source Control does is bring all the benefits of source control into the database development environment.

Let’s begin by looking at the development process.

Guiding Principles

Right from the start, we had three main principles that guided us throughout the entire development process.

First and foremost, it was important to have an ingeniously simple design, in line with the rest of our products: we knew that people weren’t going to adopt this tool if it was hard to use, added any additional overhead, or made them change the way they worked. Currently, there are very few tools out there for database source control, and people don’t (as a matter of course) use source control in this context. So, if our customers were going to adopt our new tool, we had to create something that completely integrated into their normal working environment.

Secondly, we wanted to get the minimal set of valuable functionality out to users as soon as possible, partly to garner fine-grained feedback early on in the process, and partly because we needed to get the product out into the market place quickly (we’ll discuss this later).

Finally, our sprints were two weeks long and, in order to get each new iteration of the tool out to users quickly, we wanted to be able to release right after each sprint ended. In order to do that, we needed to minimize the amount of technical debt we were carrying forward (such as bugs introduced in a previous sprint), and automate testing as much as possible, bearing in mind that our confidence in the stability of the releases was as important as maintaining our momentum.

So, to summarize our guiding principles:

  • Simple, integrated design (without which we were dead in the water).
  • Rapid releases, providing the minimum functionality needed for users to effectively use features of the system (all the better for early, realistic feedback).
  • Regular releases after every sprint, requiring automated testing and high confidence in the stability of each new build.

We’re going to take a look at the hurdles we faced in doing it this way, and what the benefits were, as well as the repercussions.

User Feedback and Early Access Releases

From previous experience, we’ve learnt that we can’t always trust our understanding of customer requirements. We might think they need one piece of functionality and spend a lot of time implementing it, but until the users actually try the tool, we don’t know whether what we’ve built fits their purpose. An intensive Early-Access Program, getting minimum viable releases of the product (as it’s being built) out to customers as quickly as possible and encouraging early and regular feedback, dramatically reduced the risk of building the wrong product.

Another recurring software development problem is that recreating realistic test environments is pretty hard. While we have labs set up with different configurations, we can’t test every possible combination, and we’re aware that the test databases we have are probably more simplistic than real customer databases. Early and regular releases of the tool, getting it installed across a variety of users’ environments, would also go a long way to increasing confidence in the tool’s stability.

That’s all very well on paper, but getting releases out fast, and collecting user feedback as effectively as possible took a significant amount of forward planning.

The Early Access Timeline

The project began back in December 2008, when we spent a month doing an initial prototype of what this tool might look like. It was knocked together in a hurry, but it was a way to get something set up in our own environment so that we could run usability tests. We ran a handful of these tests, and the feedback was already telling us that the tool needed to be integrated into SSMS, because that’s where most database developers like to work when they make changes to their databases. So, right away, a kind of early access release was shaping the tool.

1323-SSC1_small.jpg

Figure 1: SQL Source Control development timeline

In March 2009, we were able to start fleshing out the user interface (UI) designs and preparing the backlog (we’ll talk about the backlog in more detail later). Our first sprint kicked off at the end of August 2009, and in about three months we had the minimum functionality necessary to do some more detailed usability sessions: the ability to add a database to source control, commit database changes, and retrieve the changes for one supported source control system, Subversion. (We chose Subversion based on polls that we ran over the last two years, which showed Subversion as being the most popular tool that was continuing to grow in popularity.) We were able to get useful feedback on those features and, about 2 months later, we had our first ‘real’ Early-Access Release, which we could give to end users to install in their environments, and hope to get more detailed feedback.

Between January 2010 and the final release in June 2010, we had six separate Early-Access Releases, as well as two beta releases, and a release candidate, so we were, on average, releasing a new version with additional functionality every three weeks for our users to install and evaluate.

Collecting the Feedback

Of course, lots of early access is only as good as the feedback it generates, so let’s look at the different mechanisms we used to collect the feedback.

We ran usability tests; essentially remote sessions in which we watched users try the product for the first time, and saw where their pain points were.

Within the product, we incorporated links to UserVoice, a website that allows people to suggest new features or enhancements, and then vote on those suggestions. This really helped us to prioritize what we needed to do next, and see which features were really important to our users.

We also took the time to implement SmartAssembly. If you’re not aware of it, SmartAssembly is a tool that handles source code obfuscation and, more importantly from our point of view, automated error reporting. Whenever there was a bug or error in the tool, a dialog box with the error details would pop up, and users were able to send that information back to us. The error report also contains helpful information for the developers to debug the issue so that we could incorporate a fix into the next early release.

1323-SSC2.JPG

Figure 2: A SmartAssembly error dialog box

As you might imagine, that really helped with the stability of the tool. Without SmartAssembly, it’s likely that users would have tried the tool, hit a bug, and then given up without getting back to us.

We also had our usual support channels, including support forums, e-mail chats, and a phone number that people could call if they had any issues.

So how did we decide what features went into each release so we could collect feedback from our users? Well, for starters, in order to get these releases out to our users quickly, we really needed to take a look at the product backlog and ask ourselves “what’s the smallest, most basic feature we can work on?”

The Product Backlog – Keeping it Small

Something that helped us with this minimalist approach was to really focus on our product backlog, and have small stories (albeit as components of something ultimately more akin to a large epic), which would let us implement the smallest unit of valuable work possible.

We needed to build up the product feature by feature enabling a coherent workflow, so that we would have a working product at the end of each sprint. We therefore needed a backlog that comprised small, well-written stories. To achieve this, we put more up-front effort into creating that backlog than we have done in the past, and we tried to have a product backlog which was fairly comprehensive from the start of the project.

Basically, we were looking for opportunities to split stories, by saying “this part of the story is critical”, “this part we can maybe split out and maybe prioritize lower”, or even “let’s wait until we’ve got positive evidence that we need this bit of functionality before we implement it”.

Vertical vs. Horizontal

Instead of developing horizontal slices of the product, we took vertical slices. This meant that a user could go from the beginning all the way through to the end of a specific scenario. All the parts may not have been fully polished, which would have been the horizontal aspect of the product, but they could still complete valuable tasks and workflows.

The first few features we addressed were linking the database to Source Control, committing changes, and getting changes out. To keep it small, this didn’t take into account anything complex, like conflict handling or merging, and we also focused our attention on only one source control system at a time (although we now support multiple source control systems).

It’s worth bearing in mind that making a vertically consistent tool from day 1 makes early access users easier to come by, as you’re immediately offering them something they can use effectively, rather than something that causes them more pain. When users can see that they’re going to get something tangible from working with you, they’re much more willing to help.

Benefits and Challenges of Keeping It Small

Creating many small stories had its advantages and disadvantages…

On the plus side, our particular focus on the backlog meant that the stories were well written and we had a good understanding of the overall shape and size of the project. As a benefit of having so many small stories, we were able to rapidly change priorities based on what we heard from users. Moreover, because the stories were at such a low level, it allowed us to define them better, and also provide very clear acceptance criteria for them, which made it easier to estimate. This was important for keeping our velocity high, as it meant that we knew when something was done, and we weren’t polishing features when we weren’t sure they were exactly what our users needed / wanted.

Of course, there were some drawbacks to having small stories. It took a lot of time and effort to define them, and it generally didn’t feel very “Agile”, particularly at the beginning of the project, since we were spending a lot of time detailing later stories that weren’t at the very top of the backlog. And there were, by necessity, so many small stories to manage – we had over a hundred on our product backlog. Also, ideally stories should be independent of each other so that they can individually be pulled in or out at any time; however, because our stories were so small, some dependencies were unavoidable and this was difficult to manage.

That’s enough theory; now we’ll look at an example of what happened in practice.

Applying the Theory

The feature we’ll look at is “adding a database to source control”; Figure 3 shows the original UI design that was created at the start of the project.

1323-SSC3_small.jpg

Figure 3: Our original UI design for SQL Source Control

You can see that we had lots of options planned for the UI. We were able to break those down into really small stories, to be implemented individually, as shown in Figure 4.

1323-SSC4.JPG

Figure 4: Fine-grained story point decisions

At the end of sprint two we only completed the very basic “add database to Subversion” functionality. So at about four weeks into the product, users had to copy and paste their repository URL into a text box (see Figure 5). The URL had to be entered correctly -it would only work for Subversion, there was no validation, no ability to browse, and no ability to create folders. The user couldn’t even enter credentials, because the credentials were initially hard coded; what was important was that, at that early stage, we had implemented enough to allow us to do usability sessions.

1323-SSC5_small.jpg

Figure 5: The minimum viable functionality – entering the repository URL

Once we added the ability for users to enter their own credentials in sprint nine, we were ready for our first Early Access Release.

It was sprint 13 before we decided to start supporting another source control system (Team Foundation Server). The UI was still in the form of very basic text box inputs. In sprint 14, we polished the feature up by remembering which source control system the user last used and defaulting to that to make it easier for them. We also remembered the URLs to make linking easier.

1323-SSC6_small.jpg

Figure 6: Sprint 13, adding TFS support

Version 1.0 was finally released on June 30th, 2010. It didn’t look anything like the original UI designs, but it was all that was needed for getting customers to use our tool. These decisions had to be prioritized with other work and other considerations, for example, linking a database to source control is a one off task, so we wanted to focus more on the other features that could have a larger impact to our customers.

Automating Testing

As we’ve already mentioned, there was quite a lot of work needed to make our fairly ambitious release plans even remotely feasible.

We had a real challenge testing this product. There was a complexity in the number of environments we had to support: five different operating systems, seven different versions of SQL Server (factoring in different service packs), different versions of the SSMS that the product is embedded in, as well as five different versions of Subversion and Team Foundation Server… 175 distinct environments, all told.

That’s not to say we were ever going to test the full set of environments, but there was clearly complexity there that we had to handle.

Without automation, we would definitely have been looking at several weeks of manual regression testing per release. In that situation, we would not have been able to achieve a release program of the intensity we were hoping for, despite knowing all the benefits that that it would bring, and we definitely would have suffered a high cost for supporting additional platforms (new versions of SQL Server, new versions of Subversion, Team Foundation Server, new source control systems, etc).

So we put a lot of effort into our automated testing, and we had a variety of different automated tests.

We had unit tests (running off CruiseControl via Continuous Integration) which took five minutes to run, and failed the build if they didn’t pass.

At a slightly higher level, our integration tests talked to SQL Server and source control systems. They took more like an hour to run, and again, were triggered by each build of the product.

Finally, and this was really a new thing for us, we ran a fairly sophisticated set of automated graphical user interface (GUI) tests nightly, which simulated real product usage to Acceptance Test levels. We’ll go into more detail about that in a moment.

Of course, we also continued with manual testing, as there’s clearly a lot of value to be added by human testers. For starters, it was the test engineers who created all of the automated tests in the first place! The test engineers also manually tested each story, and there was a subset of tests that had to be tested manually (such as manual regression tests and exploratory tests for each release).

GUI Test Automation

The GUI automation part of the test suite was a system that allowed us to simulate real customer usage of the product within a virtualized environment, and was run from our continuous integration server on a nightly basis. The suite of tests installed and tested the product in six different environments, and took roughly six hours to run.

In the automated GUI tests, we spawned an automation application, which launched a set of virtual machines containing different environments. It then installed various bits of infrastructure, installed the product, and ran tests that simulated real customer usage. That means things like manipulating the user interface, talking to the product, setting up states, manipulating the UI, and then checking the state of the UI after those manipulations (such as whether the correct set of information is displayed, and whether the icons in the Object Explorer are the color that we would expect them to be).

The results were then returned back to CruiseControl, where they could be browsed like any other set of automated test results. A diagram of this entire process is shown in Figure 7.

1323-SSC7_small.jpg

Figure 7. Our newly-built automation framework

What Were the Benefits?

The nature of the product is such that there was a lot of complexity in the UI – it’s very tightly coupled with SSMS; it manipulates parts of the Object Explorer; it changes icons; it retrieves states; and so on. It would be very difficult to test that with any confidence in a non-UI sort of environment.

Moreover, given the complexity of the different environments we were supporting, we felt that GUI test automation was really something that would reap benefits. And it did catch bugs; it caught UI bugs where we were displaying the wrong sort of information, it caught regressions, it caught product crashes.

Ultimately, it meant that we had much less manual testing for each release. In fact, for each of those early access releases, we did no more than a couple of days of manual testing to complement the automated suite, and we calculated that one run of the automated GUI tests is equivalent to about 4 weeks of a test engineer’s time.

It also meant that we had much more confidence in those early builds than we might otherwise have had – we knew that the product basically worked. That made it an easy decision for us to say, whenever we had a bit more functionality ready, that we’d like to give it to users and find out if it was right.

What Were the Challenges?

It proved hard to make the suite reliable, and the higher you go up the testing stack, the more fragile tests tend to be. Having said that, in practice we knew which tests we could trust and which ones we needed to take with a pinch of salt. We knew what was a real product failure and what was not, so there was still a human element to this process.

We’d set out with this vision that we would like to keep testing in sync with development, but in practice we found that was really difficult. In fact, because we put such a lot of effort into the test automation at the start of the project, we’d built up a backlog of tests quite early on in the project. That took a lot of effort to reduce, and there was some testing roll-over between each sprint. Initially, we attempted to close every single bug because we didn’t want to carry any technical debt forward, but about half-way through the project we realized that this wasn’t sustainable and it wasn’t the best use of our time. At that point, we started focusing on only closing the high priority bugs, and pushing some of the work out into the future if we didn’t think was going to have a negative impact on the Version 1.0 release.

At the same time, if we encountered a really large bug or feature request, we would convert that into a story, estimate it, and put it on the board to be worked in (or not, as appropriate). Nevertheless, all the little bugs added up, and ended up taking a lot of time.

In short, it can be more time consuming to build automated tests than to write manual test cases. It took the whole team a lot of time to build the testing infrastructure, and it wasn’t just the test engineers’ time; it was developers’ time too. But obviously there is also a huge payback as a result, and now that we have that infrastructure in place, it will be a lot quicker to build future tests.

The Big Challenge – Time to Market

When we started the project there was really only one competitor product of any substance, and that was the database edition of Visual Studio 2008, which was perceived to have a very high learning curve, so we weren’t too worried about that.

However, we knew that Visual Studio 2010 was on the horizon, which would be a more serious competitor and could prompt a lot of users to start to evaluate and consider source controlling their databases. Therefore, it was important for us to be in the market at the same time.

So we put a lot of effort into sizing our backlog, and knowing the amount of work that we had to do at any given stage. Our attention to estimating and understanding how quickly we were getting through that work (our velocity) also helped us to understand how much work was left on the project and how long it would take us to complete.

Figure 8 shows the amount of work that we’d actually completed at any given stage, along with our estimates if we continued working at our best / mean / worst velocity, and the red line at the top shows the size of the product backlog. Where the lines intersected was our best/mean/worst estimates for when we could release the final version.

1323-SSC8_small.jpg

Figure 8: Our projected project timeline

We’d had some flirtations with adding support for Visual SourceSafe and then decided against it, hence the random spike early on.

What we were also able to tell from this chart was that our best case release date was the end of June 2010, and Visual Studio was actually set for release around the same time. But that was our best case estimate – our worst case estimate was end of the year, maybe even the beginning of 2011.

So the effort we put into estimating paid off immediately; we knew that we had a problem, and we had to do something about it.

Getting to Market, Quickly

We had two options: we could add more people to the team or we could reduce the scale of the project. We did add some people to the team (with great results since they were very good, the code was written very well, and we had good automated test coverage to catch regressions), but we weren’t confident that that would solve the problem. We would have had to add a lot of people to the team to guarantee an appropriate release date, and we felt that that was a very, very risky thing to do and probably wouldn’t even work.

Instead, we looked at the backlog and started asking which features we could drop, and where the big wins were. However, there were really only one or two places we could drop more features. Even those were quite limited in scope, and in fact only really got us about 1/3 closer to our target release date.

The real lesson that we learnt from this was that we were able to solve this problem by looking at all of the stories again. We’d sliced the stories small, but when we renewed our focus on them we asked how we could slice them even smaller. This was done by the entire team and we challenged ourselves to find any extras that shouldn’t be in Version 1.0. We were looking for opportunities to do, let’s say, 20% of the work and get 80% of the benefit. Even more than when we initially set out, we were asking ourselves “Where are the really difficult things that don’t actually add that much value? What is the minimum set of stuff we can do here?”

Confrontational Splitting

So we split the stories down further, applied some very strict prioritization, challenged the team’s set of stories, and found that a very valuable approach was to ask “Is this story X more important than story Y?

It turned out that, when you look at any story in isolation, it’s actually very easy to say “Of course we’ve got to have that, we can’t possibly do without it”. However, when you compare it against another story that you’ve already decided that you absolutely have to have, it’s a very different situation.

For example, we had decided that support for Team Foundation Server was clearly something we had to have, as that was half our target market. It became a very illuminating test for us to say of a story that we thought it was important, but not as important as Team Foundation Server support. Bringing in new team members also had the added benefit that they were able to challenge assumptions that the team had held for a long time about what was necessary or possible, and what was not.

Those were all very valuable processes for us to go through, and we managed to identify a fairly sizeable amount of work we could remove from Version 1.0. out of the backlog, which turned out to have no material effect on the success of the product.

By spending a couple of dozen hours in total looking at the backlog in really fine-grained detail, we managed to identify a lot of savings, and release at the same time as Visual Studio 2010.

1323-SSC9_small.jpg

Figure 9: Our actual project timeline

Estimates and Planning

Everything we’ve talked about so far in terms of how we scoped and controlled the project hinges on our estimation process, so let’s talk a little bit more about how that helped us to make better decisions throughout the project.

Even before the first sprint, the team had estimated most of the backlog. Not all of the backlog was estimated to the same fine degree; it was very large, it took a long time, and we got rather bored doing it, to be honest.

However, we estimated most of the backlog, and went through a variety of exercises to really try and get some handle on our expected velocity rate. From our best and worst estimates of velocity we then derived the earliest and latest dates for the release (with quite a wide margin of error), but at that stage these estimates were largely backed up by gut feel.

Crucially, we didn’t only trust the numbers, neither did we rely on plucked-from-the-air guesstimates. We used both of them together, and they tended to agree with each other, so we had some faith in them. Indeed, after a few sprints we updated our predictions based on our observed velocities and they generally seemed to hold true.

Spikes

It’s worth mentioning as a quick aside that we found the use of spikes extremely useful in this project. A spike, in this context, is a time-boxed technical investigation to answer a particular question such as, “How easy is it to achieve this certain thing with this API? Can Subversion do it? Can Team Foundation Server support this sort of operation?”

The team found this approach very useful, especially for de-risking, but also when they couldn’t make an estimate. So we’d look at a story and we’d say “We don’t know what to estimate for this because it depends on the answer to this question, and we don’t know how easy it is to do this thing”.

Spikes were a very good solution to that problem; when we couldn’t anticipate or estimate something, we’d do a spike in the relevant sprint and then estimate it once we’d done that investigation. As it turns out, the team started to insist on doing that at any point where they couldn’t estimate due to a lack of information, and that helped to drive out a lot of uncertainty in the backlog. And it worked.

With this process for clearing up unknowns, and a solid estimation procedure, the team estimated their most likely velocity to be around 23 points per sprint, and their actual mean velocity over the course of the project turned out to be 22 points.

What Did We Learn?

So what were our lessons from this? Well, we think that good estimation really made a difference to the project. The team tried very hard to estimate their stories carefully, but that didn’t necessarily mean spending a huge amount of time on it. It just meant being thoughtful and saying what you can and can’t estimate, and doing something to clear up those things that couldn’t be estimated at the time.

Our product owner and project manager used those numbers frequently throughout the project to run “what if” scenarios. “What if we dropped this entire feature? What if we built just this part of this feature, and this other thing? If we prioritized that story, could we do the early access release sooner?” …those sorts of scenarios.

As a result, it felt like we had a good understanding of our status throughout the project, consistently backed up by gut feel, and there were very few, if any, radical moments of “oh my goodness, we’re not where we thought we wanted to be!”

Final Lessons

The most important things that we learnt from this project were:

  • An approach of incremental development and early access releases definitely helped us to build the right product. It helped us to build the product that people wanted and were prepared to pay money for. Additionally, in order to achieve that rapid, incremental development, test automation was vital – we just couldn’t have done without it.
  • Focusing on the backlog and putting effort into making that backlog work for us really helped to build the product quicker. We could easily have spent twice as long building this product if we hadn’t continuously groomed the backlog, and it’s not clear that having spent that extra time would have made a more successful product.
  • Finally, attention to the estimating process, and the planning that we could do on top of that, really helped us to make good decisions that we trusted throughout the project.

Of course, we didn’t get everything right. We spent a lot of time on the backlog upfront, which didn’t feel very Agile and may have been a bit of wasted time since we cut features later. We had amazing UI designs that were extremely exciting, but couldn’t all be implemented due to trying to get a release out quickly. We didn’t always complete the testing of all the stories that we pulled into the sprint.

There were also lots of stories that we remained undecided about for some time, and that had a cost. I think if we’d really been on top of the numbers right at the start of the project, we would have saved a lot of time in saying “No”, letting us just focus on the genuinely crucial stories. Regardless, in the end, we delivered a tool in record time, delivered it on-target, and were happy to put the Red Gate name on it.

This describes the development of SQL Source Control 1. If you are interested in database source control, you can download, install and try the latest version of SQL Source Control in under 5 minutes.