PASS Summit 14 Dispatches: DocumentDB

During the PASS Summit 14 keynote, TK “Ranga” Rengarajan mentioned briefly Microsoft’s DocumentDB, a new NoSQL database. I was hoping to hear more. It’s an Azure-hosted JSON document data store and seems to be an attempt to marry the schema ‘flexibility’ and easy scalability that developers crave from their databases with the transactional capabilities of a relational database.

Web or mobile applications often need to collect a lot of data, and the structure of this data can be ill-defined; the next ‘row’ won’t necessarily look anything like the previous one. It calls for a more ‘flexible’ and also more easily scalable data store than can be offered by a traditional relational database. Most NoSQL databases boast of being “schema-free” so have the requisite ‘flexibility’, and certainly aren’t burdened with the costs associated with such niceties as enforcing constraints, or maintaining keys. They also offer built-in horizontal scaling, replicating parts of the dataset across numerous ‘utility’ machines and then using Map/Reduce functionality to query it.

Ranga’s point was that every device looks at some different part of the data and all must have ‘consistency’ in the data they share, with this ‘consistency’ enabled via cloud. Eventually, at least.

Without ACID transactions, there is no guarantee one transaction won’t see the partial effects of another. If the data is only ‘eventually consistent’, a model employed by many NoSQL databases, then in theory a query could read invalid data values, because the full effects of a set of writes hasn’t yet propagated to all part of the distributed system. In practice, it’s usually not as bad as this sounds. In some NoSQL stores, it might mean queries read a correct, but stale data value, rather than the true current value. Also, some NoSQL systems do use distributed transactions to enforce ACID properties during modifications, which works most of the time. Neither proposition is ideal.

DocumentDB tries to offer the answer to these apparently-conflicting requirements of flexibility and scalability versus data consistency. It is schema-free and designed to ‘scale linearly’. However, it also comes with many features comforting to the DBA. We can access data using a very SQL-like syntax. We can create stored procedures, triggers and user defined types, albeit in JavaScript (no groaning at the back).

Crucially, it also seems to offer a high degree of control over the transactional consistency. We register a JavaScript stored procedure with a collection of documents, and DocumentDB will execute transactions using Snapshot isolation, across any documents in a collection (although not between collections).