The first pillar – A Coherent Design

One of the definitions on wiktionary.org for coherence is “a logical arrangements of parts”. In my initial post, I defined “coherent” for database designs in the following manner: cohesive, comprehendible, standards based, names/datatypes all make sense, needs little documentation.  Both definitions share one specific common theme: “logical”.  To me, the crux of this statement is that whenever I do something in the following list:

  • Query for some data
  • Modify a piece of data
  • Add a column
  • Add a table
  • Change how data is validated
  • Understanding what happens in a given modification statement (like a cascading modification, either with DRI or with a trigger)

I should not have to “think” too hard about how to accomplish it.  One of the better books I had to read when I was working for an Internet startup was “Don’t Make Me Think” by Steve Krug. It was a bunch of mostly horrible websites with discussion of why they were horrible (many sucked like a battalion of Roombas). In most cases it was because when you wanted to do something, you had to think about it how to accomplish a task, requiring you to go to non-standard places to to common tasks. In the list of tasks, the first two were things that an end user might do.  The others were more oriented towards what an admin might do, but in all cases, time spent futzing around trying to figure out how the previous person did stuff is wasted time that you could be surfing on ebay looking for Futurama figurines.

The fact is, coherency almost is so broad as to be what the entire concept should be labeled.It probably could include ALL of the pillars as a title “Coherent Database Design” or something like it. I will stick to the name for now, but since it is a work in progress, if you are wanting to steal my ideas you should wait until I finish them.

So let’s look at each of the points that I outlined in the first paragraph for a coherent design:

Cohesive

By cohesive, I mean to say that all of the pieces and parts come together to provide the expected functionality. You don’t have one database that does twenty fundamentally different things, like having your customer base and your CD collection documented in the same database. A database is a container, generally used to facilitate backing up and managing like sets of data. Knowing where to break a database into smaller databases is an art, as is using schemas to break down fundamental groups inside the database. 

The rule of thumb that I consider is that I want to have the objects that work together in one place, but not necessarily ones that would never come in contact with one another.  Clearly backup and recovery are a big reason to have one database, as is consolidating objects.  So the key here is not that you can easily come up with some hard and fast rule, but if you have 200 hundred tables and 200 databases, you probably don’t have a cohesive strategy for your database design.

Names/datatypes all make sense

Now we get to the meat of it all. WAY too many designers are far too careless with the names of objects. I mean, take the name statusCode. This can only represent the status of the “thing” the row represents without confusion.  If it is really the status of another row, or another column, if it holds the last name of the user, or any other thing that is not the status, it will be confusing. And unfortunately, it is rarely going to be the case that it is so obvious that the name isn’t right. Too often the next user will treat it like the name implies, even if the documentation states that the meaning has changed.  You know what, if the meaning changes, CHANGE THE FREAKING NAME. The cost to change the name will by no means ever be larger than the amount of time it takes keeping it straight from then on.

Names are just the start though.  Datatypes are another big piece of the picture. Too many databases have about two distinct types in the database.  varchar(50) and varchar(max). Sure, you can store everything from numbers, values, names, etc in these two datatypes, but there is two problems with this.

1. It isn’t very efficient for the engine

2. It doesn’t use the database to help make sure only proper values are stored (sure you could use check constraints and triggers, but don’t!)

3. It doesn’t give the user a clue of what can go in the column.

Number 3 is the most important in the long run, while the other two are quite major. In any case, the better you name columns/tables/etc and the more focused you can be when creating types the better.

Standards based

First off, any standard is better than NO standard (and no, the standard to have no standard is cheating in the same manner that wishing for more wishes makes Genie mad.) Even the wickedly evil standards that some companies that produce software use to try to hide data structures from their customers is better than nothing (for the record, all you are doing is making it harder to get to, not impossible, and frankly making the people who work with the software annoyed to start making up stories about how the software is “limited”, “garbage”, and should be “replaced, even with inferior software.”) Having a common data dictionary that is used to define the meaning of your tables, columns, and various objects that cannot be sussed from the names is essential.

Followed standards ensure that when you start to look for something that you can determine the meaning in a common manner. Obviously the best standards are the ones that make it the easiest to get an understanding of the system  without going to the documentation.

Not the use of the word “followed”. Take the practice of naming stored procedures usp_ plus a name. Ok, so this is a user stored procedure. But someone names one without the usp_, now what? Is this not a stored procedure? So as much as I abhor such naming standards, not following the standards you have is a major issue that will cause people to have to think.  For the record, I am against any typing placed in the name, since you can easily discover from context or catalog what type something is. (Sometimes including the word view in a view is acceptable since it makes English sense too.)

Comprehensible

Much like coherent, this one is pretty big and encompasses some of the other points. The ability to comprehend what the databases/objects are there for and how to use them the less you have to document now, and the less you have to look up in the documentation later. The easier it is to just look at the structure and know what each object means the better.  The more you can comprehend from just looking at names of things the better.  Seeing that the name of the column is statusCode is easy enough, so you would expect to see a single value, probably a textual value, that tells you the status of the thing the row represents, right? If this isn’t the purpose of the column, then what? Someone might have the value 21 in the column.  What does this mean? Well, it could mean that this is a foreign key reference to the statusCode table, this could be acceptable enough, though I would have called it statusCodeId to make it clear that it was just an identifier.)  Still other, far more evil people might mean that it meets three values stored bitwise, in this case: 1 + 4 + 16.

Needs little documentation

Lastly, you can tell quality of a software package by how often you need to go to the documentation. Nerds don’t real instructions for a reason. All VCR’s work basically the same way. Toasters, toast the same way, even my Hamilton Beach combination toaster/toaster oven was pretty dang simple to operate.

Your database design should be the same way. Just basic bits and pieces of documentation, in standard locations, that people can access to find out the things that don’t work in a straightforward manner.  Give me too much information and I will ignore it. Why don’t I read the toaster instructions? Because it will tell me stuff that a 10 year old girl who wants to be an artist could figure out.  Sure there is probably a feature that I am not getting because I don’t read the docs, but I will risk it to avoid being told one more time not to use the toaster near my bathtub. Thanks for that tidbit of information..

The more you can do to avoid the need to document by making the object coherent, the better off we all will be.

Summary

I like a puzzle as much as the next guy.  Suduko, mazes, adventure video games, etc are all fun.  But the fact is, each of these have solutions that are well documented. A lot of database solutions (and really any computer solutions) have a very puzzle like quality in that you need to hunt down what you need by following very faint clues that lead you to a solution.  An afternoon for a developer starts to feel like an adventure game, speaking to one person who hints of someone else you need to talk to.  “Take an orange to bob the wizard architect, and ask him…” “Orange?” you ask.” “He likes oranges.”  Okey doke. Then you have to roll a twelve sided die to see how cooperative he will be.  Fun as an aside, but unlike a puzzle, there is not necessarily a solution to the problem like a proper puzzle creator will have spent much time cautiously setting up.

All that to say… make sure that your database makes sense to you now, and also consider future you. Does this stuff really make sense? Make sure to get a second opinion/design review from someone what does such a good job that they start to get on your nerves. And don’t be afraid to change things later when everyone else is confused by part of your design too. It is not your ego that is important….getting it right is.

Edited: Comprehendible wasn’t actually a word…Comprehensible on the other hand…