Relational Databases and Solid State Memory: An Opportunity Squandered?

Comments 0

Share to social media

“But the hour cometh, and now is” John 4:23

Those who cannot remember the past are condemned to repeat it.” George Santayana, 1905

“I would remind you that extremism in the defense of liberty is no vice! And let me remind you also that moderation in the pursuit of justice is no virtue.” Barry Goldwater, 1963

“I’m not sure I’ve even got the brains to be President.” Barry Goldwater, 1964

I first became aware of Solid-State drives (SSDs) more than a decade ago, when I came across the website for Texas Memory Systems. At that time, SSDs were, by and large, expensive dynamic memory (DRAM) devices used as accelerators, largely for database systems. Flash SSD was used, to a small but noticeable extent, for military standard (MIL-SPEC), and other ruggedized embedded usage.

About five years ago, STEC emerged as the leading boutique vendor of “Enterprise” flash SSD; building everything but the NAND flash memory themselves and using only single-level cells (SLC). These were “Enterprise Serious” parts, not the slower-than-HDD nonsense that was showing up in laptops. STEC were the major supplier to EMC – sole supplier for a period, according to legend. At first, their stock price soared, then didn’t, then soared again. But STEC have recently reported for the quarter and life isn’t soaring. Why might this be, and what effect, if any, might there be on the fit between SSD and relational database systems?

On the whole, I’m afraid the game is over. There is historical precedent, alas. In the 1960s, IBM released Direct Access Storage Device (DASD) disc drives, which were integrated with the System/360 architecture, and for which they defined direct access file methods. COBOL coders, however, ignored the direct aspect of the device, and treated it as faster tape. Sequential file-processing continued on its merry way, not least because there was already nearly a decade’s worth of existing COBOL (and Autocoder and FLOW-MATIC and others) out there that no one wanted to re-build. Codd, on the other hand, realized what random access meant to datastores, and wrote his first paper.

Likewise, some of us saw the hand-in-glove fit of SSD and RDBMS early on when flash SSD was first emerging. On more than one occasion, I wrote to vendors pleading with them to see where the market for flash SSD really was. But, again, coders saw flash SSD as a cheaper way to speed up their ‘Row-by-agonising-row’ (RBAR) code, just as their grandfathers used DASD 40 years earlier. Early on in STEC’s push into the arena, there was much talk of one-for-one replacement of HDD with SSD. Those with more level heads dismissed this. Not only did it fail to happen but, outside of sub-desktops where an SSD may be the sole storage, flash SSD has largely been relegated to Tier 1 storage, a term invented for SSD, so far as I can tell.

This all would make no sense in a rational world. In such a world you’d find that Fifth-normal-form (5NF) catalogs, made feasible by SSD storage, would become the norm, because of nearly cost free joins and perhaps an order of magnitude less data to process: But the SSD vendors haven’t taken this as a selling point for their devices. They seem content to sit on their hands and let the clients tell them what to do. While the rule that “the customer is always right” applies appropriately to restaurants, it surely doesn’t apply to venues which depend on science and technology: Would one, for example, argue with the neurosurgeon over how to remove that pesky tumor? There was a time when vendors and consultants actually brought more experience and knowledge to the table (sigh). But it would appear that the vendors, both the STECs and the EMCs to whom they sell, have consulted their abacuses and decided that the risk of any reduction in sales that might result from intelligent use of SSDs would be more harmful than only selling handfuls of them for Tier 1 intermediate storage.

I think they’re wrong. I think the big market lies in harnessing the fully relational database, and locking in clients to 5NF on SSD. Hard-disk Drive (HDD) would never be able to catch up. But the IBMs and EMCs clearly don’t want to do that. I suppose they see a threat in the reduced data footprint of high-number normal schemas. By analogy: the RDBMS vendors “extend” the ANSI standard with what they assert to be useful features, and then encourage developers to use them. Oracle added the CONNECT BY syntax decades ago to simplify hierarchical structures (the ANSI syntax we have today was concocted by IBM, likely to avoid any reparations to Oracle); Bill-of-Material (BOM) processing has been a significant requirement in commercial software from the beginning of time and BOM data has a (sort of) hierarchical structure. There’s a very good reason for an STEC to “partner” with Microsoft or IBM or Oracle to promote this method of using SSD: it’s different from RBAR processing and better in most cases.

The search-dominated thinkspiel in application development will pass. After all, there’s only so much need for toy applications, such as Facebook and Twitter, which generate infinite bytes; corporate level systems, even with 5NF, are likely to stay at the gigabyte level. For transaction dependent applications, Codd is still right. “Eventual consistency” is a farce in which half the audience don’t understand the jokes.

1421-Fusionio.jpg

I put fingers to keyboard for this missive a few days ago. I’ve just now gotten this possible reprieve from my dour foreshadowing of SSD’s future. While it won’t be cheap, Fusion-io has built an SSD large enough to be the primary store for corporate level systems, one that should handle many OLTP databases whole.

To some extent, more so when I’m depressed, I blame database developers for not having the raw guts to stand up to the hectoring of the NoSql, xml, and all the other RBAR/flatfile zealots. If we abrogate our professional training and experience to praise whichever New Clothes the Emperor is wearing, are we little more than tenant farmers? Instead, we should be exploiting these new technologies to the hilt by creating uncompromising relational databases now that SSD has at last given us the hardware to catch up with Codd’s original relational vision.

References

Codd, E.F. (1970). “A Relational Model of Data for Large Shared Data Banks“. Communications of the ACM 13 (6): 377-387. doi:10.1145/362384.362685

Codd, E.F. (1990). The Relational Model for Database Management (Version 2 ed.). Addison Wesley Publishing Company. ISBN 0-201-14192-2.

Load comments

About the author

Robert Young

See Profile

Robert Young has been practicing data science since before the term was invented. He has worked for large and small organizations, building applications as varied as OLTP for food distribution and medical prequalification software, with most RDBMS (from dBaseII to mainframe DB2) and statistical languages (from BMDP to R). His current obsession is marrying the two.

Robert Young's contributions