Chapter 5–Normalization

I won’t try to kid you. I love this subject. When you finally see the power and simplicity of it all, it is like a 200 watt halogen light bulb going off over your head in a cavern where you previously only had one of those little led lights on a key chain. In fact, it is the simplicity of normalization that makes it so much fun.  Of course, knowledge of normalization isn’t the only thing that makes you a good database designer and realistically, most people do some level of normalization without even knowing it. I dare you to find a database that has only one table with one column that expresses more than one concept. Even a newbie with an excel workbook will start to add worksheets as it becomes more difficult to manage their data in that workbook. However, understanding why you do that natural process is half of the battle.

The approach I will take in this chapter is to break down things into 4 major sections:

  • Entity and Attribute Shape – Basically this encompasses the definition of what makes a table a table and not a different data structure, and First Normal Form
  • Attribute Interdependency – Essentially the Boyce Codd Normal Form, which is basically ensuring tables have all keys defined and all non-key attributes reference the entire key set of all keys.  This encompasses Second and Third Normal Form). I had a person comment to me that the Second and Third Normal Forms were confusing, and after some reflection, I think it has to do with the way we implement databases today. Second has to do with non-key attributes referencing other non-key attributes, and Third is about non-key attributes referencing primary key attributes. But the trend is to have a surrogate primary key, so it just doesn’t click. In Boyce Code, it is about any key, rather than specifically the primary key. And a surrogate key isn’t really a key in and of itself, but a generated value that stands in for the key for ease of use and performance reasons.
  • Multivalued Dependencies – This references Fourth and Fifth Normal forms in a practical manner that isn’t confusing to the average programmer.
  • Denormalization – Selectively choosing to not implement a rule of normalization to achieve some purpose, often for a (hopefully real) performance issue. This is not to be confused with unnormalized, which is just going with what you have because it works on your machine for the size of data and the questions you are asking of it. Also known as normalcy.

One of my desires in each edition of the book is to tackle the higher normal forms in more and better manners that a person like me can understand. I am pretty thick at times, and a lot of concepts just go beyond me in their mathematical notation laden versions. Not that I am against that sort of thing, it is just that I don’t really get it until I get a simplistic example to follow.  And even worse, too many writers gloss over the details of complex topics by simply puking up the wording from the original writer and moving along, not even giving examples. I feel that if you are going to bring up a topic, you cover it, give examples, and explain it. It isn’t always practical and often I wish I didn’t feel that way (and I wish my editors didn’t feel that way too.)

In this edition, I am going to be changing up a bit of my material on First and Fourth, and Fifth Normal forms (moving some of the non-key multi-valued dependencies stuff to first normal form, since really what I describe there is technically first normal form’s requirement to have single valued attributes, though I have seen it described in multiple places as fourth…Again, kind of confusing). 

Along the way, I have been doing some research to see if I can find a better way to describe the Fourth and Fifth normal form. I wanted to list a lot of great sites about normalization, but I can’t really find anything out there other than William Kent’s paper here: http://www.bkent.net/Doc/simple5.htm. It was written back in 1982 (September to be exact), but it was the best overall thing I could find. Most other sites were too terse to glean more than an idea from, not enough to put out there for reading.

 

After I finish chapter 5, it will be time to make some major structure choices about the book, and decide if I want to dig into the new pattern/modeling chapters, or finish the rehash. The rehash is kind of mundane, though a decent amount of it will require touch-ups to cover the new Denali changes to… well, not sure what is or isn’t NDA, but suffice it to say… there may or may not be changes!