The second pillar – Normal

The first pillar was easy, since no reasonable person is going to argue that having a design that is not coherent is desirable. No matter what the type of system, any design that isn’t easy to understand is likely to be a bad design (obvious caveats are that it must be understandable to other people of a given level of intelligence in the given subject/tool/language).  The second one is where things get a bit more interesting, or even “religious.” 

In the initial post, I defined this pillar as: normalized as much as possible without harming usability/performance (based on testing).

Sounds easy enough, right? It is, but the problem with this is that there is a real lack of understanding what normalized really means, and to be honest, why you would want to do it in the first place.  To fully cover the question of what and why of normalization would take a lot more than one post ( I do an hour long presentation where I have to talk like a chipmunk at Starbucks to cover an overview, and in my book there are 60+ pages dedicated to the subject.)

Briefly, the basic gist of normalization is that you don’t duplicate data. Every single piece of data in a relational database has a very specific job and it is the ONLY point of data that does that job.  It is neither the source of another piece of data, nor sourced from some other piece (or pieces) of data.

This single point of data should also be as simple as possible, but not simpler (to totally steal an Einstein quote).  If you ever need to break down a value in a SQL statement to use it (like using substring) then you are probably not normalized enough. There are plenty of reasonable anti-examples where this breaks down, such as searching in a large string object on occasion, but even that can depend on the purpose of the large string.  Having data at this (so called) “atomic” (cannot be reasonable broken down further) level means less need for code to manage duplicate data whether copied or summarized for performance. (It will also be important to you when you want to index some data to make a search go faster.)

The reason that this is so important is that normalized data follows the same pattern of usage that the relational engine was built to handle.  Most of us would never use a hammer to screw in a nail, and it would be even less likely to drive a nail with a screwdriver. Why?  Because when we were starting to learn to use tools we learned that these were not good ideas. As young children it would not have been surprising to see a kid smacking a nail with the hard end of a hammer, but not an adult.) Sometimes you might do something less than perfect, (like using a wrench as a hammer) but as a rule, you know that the “best” practice is to have a bag full of tools that you carry around your house to get jobs done.

So why do people expect SQL Server to do a good job as a data manipulation tool when they won’t learn how it supposed to be used.  And ever wondered why functional developers and dbas and  data architects clash? Because the ratio of data architects and dba’s that understand how SQL Server works is inversely proportional to that of functional developers.  It is not a rare occasion for me to see a messy data solution to a problem and be able to formulate a SQL solution that works in a much easier fashion, and still have the ability to follow good relational data patterns.  To the person who doesn’t “get” relational, very iterative, one row = one object instance solutions are normal, leading to lots of singleton retrieves from the database to do data manipulation. But to the relational programmer, we know that taking the normalized structures and letting the SQL engine take our set based queries and do all of the ugly work for us (almost always) provides a solution that works and works fast.

But it all starts with shaping the data architecture in the correct manner (unlike the data architect, which can be almost any shape at all).