Moving from Practice to Production with Test-Driven Development

Comments 0

Share to social media

How do I stop dozens of tests failing every time I change something?

One of the most common problems when adopting test-driven development is that changing behaviour can result in dozens of failing tests that then need to be fixed one-by-one. Let’s take a look at some of the causes and how to avoid them.

Duplication in setup code

If we were writing a blogging platform, we might have a BlogPost class that has a title and a body. When we need one in a test, we could just call its constructor:

The problem is that we might use the same constructor in many tests within the same file. If we change the constructor, such as by adding a date, we then need to change every usage of the constructor.


A better solution is to avoid calling the constructor directly:

By using CreateBlogPostWithTitle, we only need to update one call to the constructor within the test class, rather than updating every test case. If the same code is used across different test classes, we should consider moving CreateBlogPostWithTitle into a separate class.

As an added bonus, the test now contains only the information that’s relevant to the test. We were originally including the body of the blog post, which didn’t affect the behaviour of the test. The setup code now only contains the relevant details i.e. the title of the blog post.

Asserting more than one behaviour in a single test

That you can test too little might be obvious, but the reverse is also true: it’s possible to test too much in a single test. When writing a test case, try to focus on a specific behaviour. For instance, suppose that we were testing the function that takes any string and converts it to a slug (a string suitable for use as part of a URL). As a first step, we want to convert any uppercase characters to lowercase, so we write the following test:

Once we’ve made it pass, the next step is to convert whitespace characters into hyphens. We might be tempted to write the following test:

However, we’re actually testing two behaviours here: as well as testing the conversion of whitespace to hyphens, we’re also testing the conversion of uppercase to lowercase.


We can test the behaviour in isolation by not including any uppercase characters in our original string:

The advantage of our new test is that it’s less likely to fail when unrelated behaviour changes. For instance, suppose we decided that the uppercase characters should be preserved rather than converted to lowercase characters. The original implementation of WhitespaceCharactersAreConvertedToHyphens would have failed once the changes had been made, whereas the second version would continue to pass.

Redundant tests

As well as testing too much in a single test case, it’s possible to have too many test cases. Each test case has costs associated with it: the time taken to run the test, and the time to maintain it. In return, the test case should give you some new information about how well the code is working. If the test is describing the same behaviour as another test, consider removing it.

As a rough rule of thumb, if you have two tests, ask yourself the question: would I ever expect one of these tests to fail, but not the other? If not, then you can probably get rid of one of them. For instance, consider the following tests:

Is it possible that one of these tests might fail while the other passes? The answer is yes: if we’re only replacing the first whitespace character we find with a space, then it’s possible for the first test to succeed while the second fails.

We could add a third test for the case of four whitespace characters:

However, the value of such a test is dubious: we probably wouldn’t expect this test to fail while the previous test passes (or vice versa). Although it’s possible that the code could be written to pass in one case not the other, you’d probably have to be intentionally malicious to cause that behaviour. As a guide, assume (potential) stupidity on behalf of the implementer, not malice.

Why does my test code look exactly like my production code?

There are some occasions when your test code ends up looking very similar to the code under test. Such tests duplicate the same, potentially flawed, logic in the production code, rather than actually describing the expected behaviour and checking it works correctly.

One area where this commonly comes up is in data access layers that allow access to a SQL database. For instance, here’s a class for counting the number of blog posts in a database, along with its test:


We can see that the calls to the database are duplicated in the original class and in the test. The problem here is that the most significant piece of logic is the SQL query itself, which isn’t tested at all. A better solution would be to turn our unit test into an integration test by using a real database.

We’ve added a method AddBlogPost that will add a blog post to the database using the repository. In this case, the title and body of the post are irrelevant, so we don’t need to pass them as arguments. As before, this also insulates us against changes in the way blog posts are added to the repository.

There are a few things to watch out for. We’ve turned our unit test into an integration test, which means it’s likely to be trickier to set up. Specifically, CreateTemporaryDatabase needs to create a temporary database, open a connection to the new database, and then drop that database at the end of the test. The difficulty of this will vary depending on what database you’re using – for instance, SQLite can create databases in memory:

Integration tests also tend to be slower and less reliable than unit tests. To keep your test suite as fast and reliable as possible, try to keep such layers as thin as possible so that as much of the code as possible can be tested using unit tests.

How should I change the tests when I extract a method from a method already under test?

Sometimes we want to extract an existing piece of code into its own function that can be reused. When we extract the code, we should also extract appropriate test cases from the original function. However, if we just copy and adjust the existing tests, we’ve introduced unnecessary redundancy into our test suite. On the other hand, we want to make sure the original function continues to behave as expected, so we can’t just delete the original tests for the extracted functionality.

For instance, suppose we’ve written a function that imports blog posts from Word documents, and we’ve written some tests for that functionality. Part of the import process might involve generating slugs from the title of the document, so we’d have some tests for that specific functionality:

Now, say we want to extract the slug generation into a separate function. Any of our original tests that tested slug generation should be converted into tests for our new slug generation function.

The question is: what do we do with our original tests? The document importer hasn’t changed its behaviour, but if we keep the original tests, then we’re testing the same behaviour twice. This would lead to brittle tests, as described above.


One option is to use mocks for the slug generation. We can then change the implementation of slug generation without affecting the tests for the import function. While this is often an appropriate response, it means introducing more boilerplate code in our tests. It also means that the tests are exposed to the interface of the slug generation function. If the slug generation interface changes, we’d need to update both the import function and its tests, rather than just the import function.

In cases such as this, an alternative is to leave a single test that ensures that we’re calling the child function, but to leave the thorough testing to the direct tests of the child function. Ideally, we’d choose a single test that relies on the behaviour least likely to change. In our example, we might keep the first test (verifying single whitespace characters are converted to a hyphen) since it’s unlikely we’d start converting whitespace to a different character. If we did, then fixing this one case is relatively quick. If we were to change slug generation so drastically, then a test failure might even be useful to make us consider whether such as a change is suitable for each use of the slug generation function.

On the other hand, we should discard the second test (verifying runs of whitespaces are converted to a single hyphen). This edge case is already covered by the direct tests of slug generation, and is more likely to change than the first test. In reality, we’d probably be discarding a much larger number of tests, having already converted them to tests for the child function.


We’ve looked at solutions for a few problems you might encounter when using test-driven development on a real project, although this is far from comprehensive. The general ideas are just as important as the specific details:

  • Don’t repeat yourself (just like production code). If you’re repeating yourself, consider adding another layer of abstraction.
  • Don’t test too much. Write the simplest test you can to make sure one specific behaviour is working.
  • Don’t add redundant tests. If two tests will always pass or fail together, consider removing one of them.
  • Try to remove details that aren’t directly relevant to the behaviour under test. If you need some placeholder values, see if they can be hidden in helper functions.
  • Make sure you’re testing the most significant part of the production code. In database access code, that’s often SQL queries, which means using a real database might be the best test.