An Agile House of Stone

A couple of weeks ago, Tony discussed why complex databases can’t be created in the Agile way. As someone that works on code from the database right up to the user interface, I wonder what makes a database so different from application code. The arguments in favour of an agile approach are just as strong in databases as in code – in particular you can’t predict future requirements, and if you attempt to do so, you are very likely to get them wrong, or build something far more complex or larger than actually required.

Tony specifically mentioned that a database has to be planned in detail beforehand if it is to perform quickly, reliably and securely over time, but software often has the same requirements, and agile development already has approaches to deal with them, in code. If, say, performance really is a requirement for your application, then why not have an automated test suite that can consistently test performance? A key part of many agile practices is refactoring, and having a suite of automated regression tests gives you the confidence to make changes while knowing that everything still works. The automated part is particularly important – if you’re doing these tests (or, arguably more accurately, checks) manually, then their cost will keep growing as your application grows.

What is it about a database that makes an agile approach to development harder than with code? There are certainly different challenges in an agile approach to database development. One difference is the fact that at least some part of it is required to persist. With software, you can often just restart the application with the new code; with databases, you must migrate any existing data to the new schema. If you’re constantly changing your database, then you need to do this repeatedly. However, in my experience, this has only been a problem where the original schema is poorly understood – “What does that column do?”, “Is that date column the creation date, or modified date?”, “What does null mean?”, “Is that text plain-text or HTML-encoded?” and so on. Once you have a good handle on your existing (as well as new) schema, data migration isn’t usually a hellish experience, especially if you’re taking an agile approach by taking a small steps, so that you only have to migrate a small part of your database to an updated schema at once.

Another impediment to database refactoring is “leaky abstractions”. It is difficult to protect the application completely from changes in the database. Ideally, all database access would be underneath some layer so that, assuming the business logic remains the same, the layers beyond the abstraction are not affected by changes in the database. Of course, abstractions often leak, and a faulty database abstraction can leak all the way up to user interface code. This means that we still need to change code at every layer of the stack, despite having a database abstraction layer. However, this is really no different to any other part of your codebase that has such a central role. Strict adherence to DRY (don’t repeat yourself), often requiring ruthless refactoring, is crucial in minimising the cost of changing your database.

Refactoring can also be hindered by the changes required in the database abstraction layer itself. For instance, if you rename a column, you might have to find all the references to that column in dozens of SQL strings. This is where an ORM such as NHibernate, with its QueryOver interface, can come in handy. These allow you to write database queries in the comfort of C#, Java, or your language of choice. Design your database schema carefully, and use an ORM to perform a literal mapping from a database row to an object in your static language – if your database row has a field called “Title” of type varchar(255), then your corresponding object should have a field called “Title” of type String. You can lean on existing tools to, say, rename a field across your entire codebase, allowing you to refactor more quickly, and with greater confidence that you haven’t missed a pesky reference in an obscure part of the code. Even if you do reference a field that doesn’t exist, the compiler will point it out. This is no replacement for a thorough suite of tests, but it can give you a little more confidence and help you find errors more quickly.

Finally: would I build a house with an agile approach? No – but that’s because the costs of building a house are completely different. With both software and databases, I can delete huge swathes with a single key, or make changes to any component, and have a suite of tests that will tell me within minutes if I’ve done something horrifically wrong.

Cheers,

Michael