Hekaton in 1000 Words

The SQL Server 2014 In-Memory OLTP engine (a.k.a. Hekaton) is a radical change for relational databases. This article, an exerpt from Kalen Delaney's book "SQL Server Internals: In-Memory OLTP", provides a brief overview of what Hekaton is and why it's important.

When the SQL Server team set out to design and build a specialized database engine specifically for in-memory workloads, their vision was of a new relational database engine that would be 100 times faster than the existing SQL Server engine, hence the codename Hekaton, from the Greek word (á¼ÎºÎ±ÏÏν) meaning 100.

SQL Server 2014 delivers this new engine component, now called SQL Server In-Memory OLTP, and so allows us to work with memory-optimized tables and indexes, in addition to the disk-based tables and indexes, which SQL Server has always provided. These new data structures are designed from the ground up to exploit up to terabytes of available memory and high numbers of processing cores. It potentially marks the beginning of the end of the days of “IO-bound” SQL Server.

In-memory OLTP is integrated with the SQL Server relational engine, allowing us to access in-memory data using standard interfaces such as T-SQL and SSMS, transparently. However, its internal behavior and capabilities are very different than those of the standard relational engine. Everything you knew about how your SQL Server stores and accesses data is different in Hekaton. Everything you understood about how multiple concurrent processes are handled needs to be reconsidered.

What is a memory optimized table?

A memory-optimized table is one where SQL Server will always store in memory the whole table and its indexes. Therefore, when accessing in-memory data structures, user processes will always find the required data in-memory.

It sounds like a glorified DBCC PINTABLE…

SQL Server In-Memory OLTP bears no relation whatsoever to DBCCPINTABLE, a feature available in older versions, whereby if we “pinned” a table, SQL Server would not remove from memory any of its data pages. These pinned tables were no different than any other disk-based tables. They required the same amount of locking, latching and logging and they used the same index structures, which also required locking, latching and logging.

By contrast, table and index structures in Hekaton are completely different from their disk-based counterparts. SQL Server can guarantee the ACID properties of every transaction, without taking any locks or latches on memory-optimized tables and indexes, during reading or writing. Readers don’t block writers, and writers don’t block readers, or other writers.

Also, logging changes to memory-optimized tables is usually much more efficient than logging changes to disk-based tables.

What’s so different about how data is stored in Hekaton?

The standard relational engine is architected to assume that the data resides on disk. It reads and writes 8 KB data pages, as a unit, to and from disk, and stores them in extents. Each page “belongs” to a particular object. This process can generate a lot of random I/O and incurs a high latency cost.

For memory-optimized tables, there are no data pages, and no extents. There is no notion of data rows being written to a particular location that “belongs” to a specified object. There are just “data rows,” written to memory, in the order the transactions occurred, with each row containing an index “pointer” to the next row. All “I/O” is then in-memory scanning of these structures.

Many versions of the same row can coexist at any given time. This allows concurrent access of the same row, during data modifications, with SQL Server making available the row version relevant to each transaction according to the time the transaction started relative to the timestamp values stored in the header of each row version.

So there really is no blocking in Hekaton, ever?

Hekaton removes the standard assumption that the data resides on disk. For memory-optimized tables, SQL Server will never have to acquire latches nor perform I/O in order to retrieve data from disk. Furthermore, SQL Server can guarantee to preserve the ACID properties of all transactions without acquiring any locks. Therefore no transaction will ever be blocked, waiting to acquire a lock.

The row structure described above underpins a new, truly optimistic concurrency model using a Multi Version Concurrency Control (MVCC) system, where SQL Server maintains multiple row versions, and determines the correct row version that each concurrent transaction should access.

In the MVCC model, writers don’t ‘block’ writers. If transaction Tx2 tries to modify a row currently being modified by Tx1, SQL Server optimistically assumes that Tx1 will commit, raises an update conflict, and aborts Tx2 immediately (rather than block it, awaiting the final outcome of Tx1).

Also, readers don’t block writers, nor vice versa. SQL Server optimistically assumes that concurrent readers and writers won’t ‘interfere’, but performs validation checks, post-commit, to detect any potential violations of the properties specified by the transaction isolation level.

However, this does not mean there is never any waiting when working with memory-optimized tables in a multi-user system. A complex or long running transaction on an OLTP system can still cause problems, even with Hekaton. For example, it may cause waits while dependencies resolve during the validation phase. Transactions might also need to wait for log writes to complete. However, in either case the waits should be of very short duration.

Is Hekaton really 100x faster?

You may not find that Hekaton is 100 times faster, but could be many times faster, depending on the workload.

A key factor is that memory-optimized tables are natively compiled. SQL Server holds in memory, not only the table and index structures, but also a set of DLLs for accessing and modifying these data structures. The table metadata encodes into each DLL a set of native language algorithms that describe precisely the row format for the table and how to traverse its indexes. Hekaton also offers natively compiled stored procedures, which generate far fewer CPU instructions for the engine to execute than the equivalent interpreted T-SQL stored procedure, and can be executed directly by the CPU, without the need for further compilation or interpretation.

When we access memory-optimized tables from natively compiled stored procedures, we have a highly efficient data access path.

What’s the Catch?

Currently, there are quite a few limitations on the range of data types supported for memory-optimized tables, restrictions around use of indexes and constraints (no FOREIGN KEY or CHECK constraints) and limitations on the T-SQL language constructs that are allowed inside a natively compiled stored procedure. In addition, there are other requirements, such as the need to specify a binary collation for any indexed character fields on in-memory tables, which could make migrating to Hekaton difficult, in some cases.

Some of these ‘barriers’ may disappear in subsequent releases.

This is an excerpt from the book SQL Server Internals: In-Memory OLTP. You can find out more about this book on it’s Simple-Talk page, or purchase it on Amazon (US | UK).