Normalisation and ‘Anima notitia copia’ (Soul of the Database)

(A Guest Editorial for Simple-Talk)

The other day, I was staring  at the sys.syslanguages  table in SQL Server with slightly-raised eyebrows .


I’d just been reading Chris Date’s  interesting book ‘SQL and Relational Theory’. He’d made the point that you’re not necessarily doing relational database operations by using a SQL Database product.  The same general point was recently made by Dino Esposito about ASP.NET MVC.  The use of ASP.NET MVC doesn’t guarantee you a good application design: It merely makes it possible to test it. The way I’d describe the sentiment in both cases is ‘you can hit someone over the head with a frying-pan but you can’t call it cooking’.

SQL enables you to create relational databases. However,  even if it smells bad, it is no crime to do hideously un-relational things with a SQL Database just so long as it’s necessary and you can tell the difference; not only that but also only if you’re aware of the risks and implications. Naturally, I’ve never knowingly created a database that Codd would have frowned at, but around the edges are interfaces and data feeds I’ve written  that have caused hissy fits amongst the Normalisation fundamentalists. Part of the problem for those who agonise about such things  is the misinterpretation of Atomicity.  An atomic value is one for which, in the strange virtual universe you are creating in your database, you don’t have any interest in any of its component parts.  If you aren’t interested in the electrons, neutrinos,  muons,  or  taus, then  an atom is atomic. In the same way, if you are passed a JSON string or XML, and required to store it in a database, then all you need to do is to ask yourself, in your role as Anima notitia copia (Soul of the database) ‘have I any interest in the contents of this item of information?’.  If the answer is ‘No!’, or ‘nequequam! Then it is an atomic value, however complex it may be.  After all, you would never have the urge to store the pixels of images individually, under the misguided idea that these are the atomic values would you?  I would, of course,  ask the ‘Anima notitia copia’ rather than the application developers, since there may be more than one application, and the applications developers may be designing the application in the absence of full domain knowledge, (‘or by the seat of the pants’ as the technical term used to be). If, on the other hand, the answer is ‘sure, and we want to index the XML column’, then we may be in for some heavy XML-shredding sessions to get to store the ‘atomic’ values and ensure future harmony as the application develops.

I went back to looking at the sys.syslanguages table. It has a months column with the months in a delimited list


This is an ordered list. Wicked? I seem to remember that this value, like shortmonths and days, is treated as a ‘thing’. It is merely passed off to an external  C++ routine in order to format a date in a particular language, and never accessed directly within the database. As far as the database is concerned, it is an atomic value.  There is more to normalisation than meets the eye.