Are Unit Tests Overused?

Unit Testing has come to dominate the many types of test that are used in developing applications. This has inevitably been at the expense of other types, such as integration test. Does a successful unit test regime ensure quality, or should we see unit testing as just one of a range of tests that can together give us confidence in an application?

Unit tests and Test Driven Development, in particular, are very fashionable right now to the point where I feel we overuse them, in places where they are not especially helpful.

More damaging, though, in my opinion, is that unit tests forces us to expose implementation details of our code. We often need to expose the dependencies in our code so the tests can eliminate them, when all we really want to expose is the higher-level encapsulation. Sometimes it does make sense to try to test an implementation in isolation, but it’s important not to put the cart before the horse. If the needs of the test reduce the quality of the architecture, or if the test is a tautology then we should consider other techniques.

Rather than more and more unit testing, I’d prefer people channel extra energy into better ways of integration testing.

The Good and Bad of Unit Tests

Over the past decade, development processes such as Test Driven Development (TDD) have gained prominence, with the laudable goal of simplifying designs, and making them more flexible and easy to change. TDD encourages small, incremental improvements to a working system, writing a failing test (such as a unit test) that defines some part of the behavior of each unit (typically a method, or a single path through that method). We then strive to write the simplest code that will make our failing test pass, leading us to write very loosely coupled, well-encapsulated components, and so on. At least…that’s the beautiful theory.

The problem I find with unit tests is the same problem I find with design patterns, and other examples of very good ideas that find themselves tagged with the ‘best practice’ label: they get applied everywhere, often to the detriment of other forms of testing.

When we have succeeded in writing simple methods with very few dependencies or where it is relatively easy to refactor that code so we can test it in isolation, unit tests are very effective.

However, real code has dependencies we can’t remove easily. In his recent (quite heavily criticized) piece on Unit Testing Myths and Practices, Tom Fischer observed that…

“…dynamically cobbling together different data stores, libraries, web services via dependency injection, configuration files, and plug-ins spawned an entire new breed of defects for which unit testing proved marginally effective.”

I have some sympathy with this point of view. We’re supposed to remove all these dependencies in order to write our unit tests. This means that we can write very specific tests and, as long as we’ve applied the DRY (Don’t Repeat Yourself) principle to our tests as well as our code, then when we make a breaking change our failing test will point us at the exact piece of code that has the issue. These tests will also help us catch edge case bugs. However, by removing dependencies in this way, it also starts to limit the usefulness of what we can test.

Not every issue arises solely from a small piece of incorrect code. Many bugs only appear when the dependencies are in place, for example, when we’re using the real data rather than a test database. When we’re writing code that uses a third-party library (or even using code written by more than one developer), the hard bugs will often come from misunderstandings about how they are supposed to interact, so it’s common have two pieces of code that work in isolation but don’t actually work together. Unit testing eliminates the dependencies and can’t detect this.

Of course, no one ever argued that unit tests on their own are sufficient to catch all bugs, or that they are a replacement for system and integration testing. However, my experience is that, in practice, unit tests quickly become the proverbial hammer that makes everything look like a nail. Many developers do spend their time writing more and more, often only marginally useful unit tests, rather than spending that time on the other forms of testing that will catch the bugs that only appear when we’ve wired everything together.

There is, however, a more serious problem with unit testing than “overuse” and it’s that they can destroy encapsulation.

The Importance of Encapsulation

There is a theory of the Universe, the Holographic Universe Theory, which, if you’ll allow me to simplify a little, says that we can derive an entire description of the Universe from knowledge of just the state on the outside. In other words, we don’t care what’s happening inside because encoded on the sphere that surrounds it is a full description of the universe.

A similar theory underpins good software design, and is manifest in principles such as component-oriented design and encapsulation. The internal representation of a component is, or should be, hidden; we don’t care about its inner-workings as long as we fully understand its surface, the interface, which describes everything you need in order to understand how the component should behave and to test whether or not it is behaving correctly, and we only program to the interface.

For me, this remains one of the deepest insights into software engineering. It is a founding principle of UNIX, and differentiated UNIX from other operating systems at the time. A UNIX program is essentially a set of ‘black box’ programs connected by pipes, in the form of very narrow, very tightly specified interfaces. Each program “does one thing and does it well” and programs communicate via a simple text stream. This makes it easy to stick many programs together, just by having them send text to each other, and checking the outputs were correct for the given inputs.

UNIX offers a very persuasive example of how encapsulation enables change. Look at any standard UNIX tool, and you’ll find that none of the documentation focuses on its inner workings. It just says, “This is what happens when you run ls or you run cat“. It means that anyone can come along and write a new version of cat, as long as its interface works exactly as described in the documentation.

It also means that anyone can check that a program does what it’s supposed to do just by reading the documentation, so tests are independent of the implementation.

By Eliminating Dependencies, we can Destroy Encapsulation

The problem I see with overuse of unit tests is this: in order to isolate a piece of code, we often need to expose the dependencies so the test can eliminate them.

When designing code for release, we often want to keep the tests in a separate library (so that we don’t ship them to the customer). With unit testing, we also want to test individual implementations, which are often the things that are marked as private or internal. In order do this, we have to make public the lower-level implementation details; it’s necessary to write InternalsVisibleTo so that the testing framework has a privileged view of the code and can explicitly test this stuff. This means that implementation details wind up having dependencies elsewhere, which is a broken architecture.

Ultimately, if tests contain assumptions about, or depend on, a particular implementation, then not only are they testing the behavior, but also testing that those assumptions and dependencies are still present. If we were to replace an implementation with an entirely new one where the outward results are equivalent, but which had removed a dependency, then the tests would fail, not because the code is now incorrect but because the tests are incorrect.

Unit tests are supposed to prevent this by removing all of the dependencies, but run into the issue that in order to remove a dependency, we have to first know what the dependency is, and expose the implementation details where that dependency is made.

TDD’s “write the test first” philosophy can help here: if we don’t know the implementation in advance, we can’t make assumptions about it. However, in my experience, it’s common that tests change after we’ve written the implementation because of some newly discovered requirement, or we introduce new tests for bugs not covered by the original suite of tests. This is when the problem arises.

Ultimately, code is more maintainable the fewer implementation details we reveal through the API, because it allows us to change the implementation without having to change other pieces of code. The more we “poke holes” in the interface the more it reduces the quality of the API and reduces the maintainability of the code.

Once we write poorly encapsulated code, other developers will be more tempted to peek inside and write dependent code that “secretly” uses knowledge of the other code’s inner workings, such as the real type of the returned item. Unit tests won’t tell you about this hidden dependency; the original code still does what it always did. However, at this point, it’s no longer “safe” to change the original code, as the other, new code (which could be anywhere) is dependent on the specific implementation of the method.

Tests can only detect breakage; encapsulation can actively prevent it from ever happening in the first place, so when the two concepts are at odds, it’s better to sacrifice “unit testability” for encapsulation.

Exploratory Integration Testing

To make reliable software, unit tests play an important role, but I’ve come to regard them as a rather formal type of testing. A good analogy is the “bed of nails” tester in electronics where by rigorous design of the tests we can make sure our implementation conforms to certain behaviors.

Unit tests have become the hammer that makes everything look like a nail. What we really need, I think, is not more and more unit tests, but better encapsulation, very tightly defined interfaces, and a better way of writing and automating integration tests that allow us to do a more ‘exploratory’ form of testing, with dependencies in place.

If we have a set of ‘black box’ programs using a simple form of communication across very narrow, very tightly specified interfaces then it’s harder to test those interfaces because we can’t test the individual functions of the things we’re plugging together.

What we need to be able to do, in electronic engineering terms, is get out the oscilloscope and a battery. Place our electrodes, measure the behavior; change an input, measure it again and see how it changed. In this way, we detect bugs by finding the places where the behavior of the code changes unexpectedly. We need better tools to support this sort of exploratory testing. As we explore the behavior of our code, the idea is that such a tool would record as much of this behavior as possible (which means function return values, and so on). We then detect bugs by asking the tool to do the same things again and seeing what has changed.

Consider, as an analogy, the humble spreadsheet. We can plug all sorts of complex calculations into a financial spreadsheet. Once they are all in there, we can start running ‘what if’ scenarios, where we modify our inputs regarding various revenue and expenditure streams and see what comes out of our financial model. This in essence is what exploratory testing is all about, and I’d like to be able to test software in a similar fashion. If I use this input for this bit of code, what is the behavior? In this form of testing, we don’t care about implementation, and the library of tests develops as we’re developing and understanding our code. We write some code, we test it straightaway; we can try it out and see what it does very quickly. Once we understand how the code is supposed to perform, by exploring it, we can write assertions for it and turn these into formal automated tests.

An additional advantage of this sort of “top-down” exploratory testing is that if we’re working from the public API, the tests are more in the form of “does the program do its primary task” rather than “is this specific implementation correct”, while still being able to detect and narrow down the same kinds of errors. If we replace implementation A with implementation B, and our tests only look at the interface, then they will provide some verification that the replacement was a success. In other words, tests that work at a more abstract level will start passing once we pick the right implementation.

By testing at a higher level, with tests that are unaware of any dependency not exposed via the public API, we remove the problem of tests that test functionality as well as whether or not any underlying assumptions/dependencies are still valid. We also make it much less likely that we’ll introduce this problem later, by accident.

Of course, this form of “top down” testing means that a test might need to start up a significant chunk of the program we wish to test and this might make it harder to pin down the specific part of the program that has failed. However, the important point is that the failure exists and we’ve detected it, so preventing us from releasing bad code.

Summary

In my experience, developers channel the vast majority of their testing energies into unit tests, even in knowledge of the fact that just because the smallest components of a program all operate correctly in isolation, it doesn’t mean the program as an entity is correct.

I don’t suggest that anybody believes that unit tests are a substitute for integration and system testing, but I do suggest that there’s a tendency for developers to get too involved with the mechanism rather than the intent, especially when looking at the smallest parts of a piece of software, which can cause this to happen.

Writing good software requires balancing many different interests, which are sometimes at odds. If you emphasize any one of them (say, tests) above all the others, you start to make trade-offs in other areas. One of the claims of TDD is that code that is unit testable is also automatically code that is well-architected (loosely coupled components, simple well-defined interfaces, and so on), but this isn’t necessarily true, for instance because unit tests often need to break encapsulation, in order to be actual unit tests rather than integration tests.

The sort of exploratory testing I suggest is necessary, and for which good tools don’t currently exist, is top-down by nature, so we’d start at a higher level. The usual way to do this, now, is to run the program and play around with the UI, which is usually the most abstract interface in the entire application. I envisage the equivalent but for playing around with the code.