After a young developer (or database developer) learns about normalization, the quality of his databases increases a lot, there is no doubt about that.
However, sometimes we need some denormalization, even in production databases. I faced some experienced developers that, although, they would agree this should be analyzed in a case-by-case scenario, they never saw an example of this situation.
There is a very simple and common example of this situation and we can even go beyond the example and draw some simple rules about this.
First, one small disclaimer: People with way more knowledge about modelling than me may argue denormalization in production doesn’t really exist. The fields below are, in fact, two different information. We are used to put the same name to them and therefore, we think they are the same information. Sometimes we even drop one of them from the database.
However, these are only two different ways to look to the same thing and achieve the same result. In my opinion, looking into this as denormalization is easier.
What’s wrong with the model below
This is a classic model about products and orders. Of course, the model is simplified, but the challenge is to notice what is wrong with this model. Does it appear Ok?
What happens with the record of old orders when the company decides to raise the price of the products ?
Probably you already guessed the result: Every time a raise in prices happens, this affects old orders as well, destroying the data.
The Solution: Denormalization
It’s easy to solve this problem: When we register an order, we need to copy the price together the order, as a static value.
The new model will be like this:
According to the 3rd normal form, we can’t have any transitive dependency between any field and the primary key. In other words, the primary key of a table needs to uniquely identify all the field values, the values should not be linked to any other field.
The UnitPrice field in the OrderDetail table has a dependency with ProductId field, which is not part of the OrderDetail primary key. This establishes a transitive dependency and due to that, this table is not in the 3rd normal form
How to Identify these scenarios
In order to identify these situations when we face them, we can look for one specific scenario in our database model. This scenario is easier to identify if we apply a BI point of view.
From a BI point of view, we have facts happening in our business and these facts are defined by many linked information to them, the dimensions.
The order is a fact, something that happens in our business. The product is a dimension linked to the fact. An order was made, and this order is related to a product.
The fact is static along the time and that’s what highlights our problem and solution. The information related to the fact can’t change. So, the price from the moment of the order needs to stay static. That’s when we will know we need to “copy” the price from the Product table to the OrderDetail table.
Everything is a misunderstanding
Some people could say this is not denormalization at all, it’s just a misunderstanding. They consider the price of the product on the Product table is not the same information as the price of the product in the OrderDetail table. They have a different lifecycle, different meaning. The fact they will have the same value for a while is not important at all.
I, in my humble opinion, prefer the first point of view, based on the idea of facts. It’s easy to identify and follow.
I’m not a book worm about detailed concepts on database modelling. In case I missed something, feel free to add on the comments. However, I believe this is a practical approach to identify a scenario where denormalization is needed in production. I already found developers get confused with this.