Hadoop, NOSQL, and the Relational Model

(Guest Editorial for the IT Pro/SysAdmin Newsletter)

Whereas Relational Databases fit the world of commerce like a glove, it is useless to pretend that they are a perfect fit for all human endeavours. Although, with SQL Server, we’ve made great strides with indexing text, in processing spatial data and processing markup, there is still a problem in dealing efficiently with large volumes of ephemeral semi-structured data.

Key-value stores such as Cassandra, Project Voldemort, and Riak are of great value for ephemeral data, and seem of equal value as a data-feed that provides aggregations to an RDBMS. However, the Document databases such as MongoDB and CouchDB are ideal for semi-structured data for which no fixed schema exists; analytics and logging are obvious examples.

NoSQL products, such as MongoDB, tackle the semi-structured data problem with panache. MongoDB is designed with a simple document-oriented data model that scales horizontally across multiple servers. It doesn’t impose a schema, and relies on the application to enforce the data structure. This is another take on the old ‘EAV’ problem (where you don’t know in advance all the attributes of a particular entity) It uses a clever replica set design that allows automatic failover, and uses journaling for data durability. It allows indexing and ad-hoc querying.

However, for SQL Server users, the obvious choice for handling semi-structured data is Apache Hadoop. There will soon be an ODBC Driver for Apache Hive .and an Add-in for Excel. Additionally, there are now two Hadoop-based connectors for SQL Server; the Apache Hadoop connector for SQL Server 2008 R2, and the SQL Server Parallel Data Warehouse (PDW) connector. We can connect to Hadoop process the semi-structured data and then store it in SQL Server.

For one steeped in the culture of Relational SQL Databases, I might be expected to throw up my hands in the air in a gesture of contempt for a technology that was, judging by the overblown journalism on the subject, about to make my own profession as archaic as the Saggar makers bottom knocker (a potter’s assistant who helped the saggar maker to make the bottom of the saggar by placing clay in a metal hoop and bashing it). However, on the contrary, I find that I’m delighted with the advances made by the NoSQL databases in the past few years. Having the flow of ideas from the NoSQL providers will knock any trace of complacency out of the providers of Relational Databases and inspire them into back-fitting some features, such as horizontal scaling, with sharding and automatic failover into SQL-based RDBMSs. It will do the breed a power of good to benefit from all this lateral thinking.