The myth of over-normalization

I’ve always been suspicious of denormalizing an OLTP database. Denormalisation is a strange activity that is supposed to take place after a database has been normalized, and is assumed to be necessary in order to reduce the number of joins in queries to a tolerable level. C.J. Date is quite clear on this; well, he is slightly less opaque than usual: any denormalization to a level below 5NF is a ‘bad thing’ (he says ‘contraindicated’). 

In practice, normalization to fifth Normal Form is unusual. Normally, the Database designer reaches for his hat and coat after reaching Boyce/Codd Normal Form (BCNF), which is 4NF, and few databases I’ve ever seen are even reliably BCNF.

Why does one ever normalize a database? It is often said that it is to avoid logical inconsistencies, and to avoid insertion, delete and update anomalies. Also, to avoid redundancy, or duplication, of data. I think there is more to it than that. It also ensures that your data model makes logical sense. If, at the end of the normalization process, you arrive at a set of tables that correspond to simple, easily understood entities, then the chances are that you have got a database model that will sail through the inevitable changes in scope, changes in the application, extensions and so on. If you don’t, then it is time to tear up your database design and start again. Normalization isn’t like sprinkling on fairy-dust; it is a way of testing and ‘proving’ your design.

Too often, denormalization is suggested as the first thing to consider when tackling query performance problems. It is said to be a necessary compromise to be made when a rigorous logical design hits an inadequate database system. As the saying goes, “Normalize ’til it hurts, then denormalize ’til it works”. In fact, Denormalization always leads eventually to tears. It complicates updates, deletes and inserts; it renders your database difficult to modify.  Maybe once there was an excuse for a spot of denormalization, but on a recent version of SQL Server or Oracle, with indexed or materialized views, and covering indexes, the performance hit from multiple joins in a query is negligible. If your database is slow, it isn’t because it is ‘over-normalized’!

As always, we’d like to hear what you think. The best comment, according to our distinguished panel of judges, will receive a $50 Amazon voucher, and three runners-up will get a Simple-Talk gift pack.

  

Cheers,

Tony.