Dial in the scale: Amazon’s new DynamoDB

Dynamo is a fast and scalable proprietary key-value structured storage system that gives the features of both simple databases and distributed hash tables (DHTs), running on SSD storage to provide other Amazon services such as S3. It it not ACID-compliant ore relational but is great for doing analysis on large amounts of simple data

Last week Amazon announced their new non-relational database service, DynamoDB, and launched it in beta in the US-EAST-1 region. Amazon’s CTO, Werner Vogels, describes the product as “a fast and scalable NoSQL database service designed for internet scale applications“. DynamoDB has generated a lot of traffic and buzz on the Internet since its launch, but what is it really? Now the dust is settling after the big launch, let’s go through an introduction to DynamoDB and some of the history that led to it.

What is Dynamo?

Dynamo is a storage system designed by Amazon to use for services where reliability and high availability are important, but the trade-off with consistency is acceptable. For example, a customer’s shopping cart or preferences is important data that Amazon want to ensure isn’t ever lost, even in the case of network failure. In technical terms, Dynamo is an eventually consistent (which means that writes made to one node will eventually propagate to all the nodes, without making the application wait) key-value store (meaning there is no pre-defined database schema, which leads to easy scaling across many nodes). The technical details of Dynamo were published in a 2007 paper that went on to influence the design of many similar non-relational databases, including Cassandra (built at Facebook and now an Apache Software Foundation project), Voldemort (built at LinkedIn), and Riak(built by Basho, formed by engineers originally from Akamai). Dynamo itself isn’t something you can buy from Amazon: Vogels says that they use Dynamo “and similar Amazon technologies” to provide services such as S3.

1668-Dynamo.jpg

A steam powered dynamo in operation

What about SimpleDB?

SimpleDB was Amazon’s first foray into offering ‘NoSQL as a service’. It is a popular choice for small applications, but is easily outgrown: SimpleDB tables have a storage limit of 10GB and a low capacity for concurrent writes. To work around either, you have to partition your data yourself. The lack of an easy or affordable snapshot or backup facility, the uncertainty of performance, and the lack of other implementations and associated fear of lock-in are all possible reasons that SimpleDB never really got mainstream traction. Many people who investigated it soon moved onto one of the other NoSQL options.

What is DynamoDB?

DynamoDB takes the lessons of SimpleDB and offers ‘NoSQL as a service’ at unlimited scale. As with other Amazon services, you just need to specify that you want a table and what performance you want it to have, and it’s available to you moments later. Fill it with as much data as you want and Amazon will handle the rest, scaling data automatically across multiple availability zones (within the same region) as required to provide the guaranteed performance you asked for. All DynamoDB data is stored on solid state drives (SSDs), suggesting typical performance is “single-digit millisecond latencies for database read and write operations”, regardless of the amount of data. Vogels says in the press release that “Amazon DynamoDB is the result of everything we’ve learned from building large-scale, non-relational databases for Amazon.com and building highly scalable and reliable cloud computing services at AWS”. Adding some inside insight, James Hamilton, a Distinguished Engineer at Amazon, says: “The DynamoDB service is a unified purpose-built hardware platform and software offering. The hardware is based upon a custom server design using Flash Storage spread over a scalable high speed network joining multiple data centers.” DynamoDB has been in the works for over a year and involved participation from many teams inside the company. Every indication, including the effort that went into the announcement (a live video stream, rather than just the usual blog post) is that Amazon consider this a Big Deal. DynamoDB addresses most of the major problems with SimpleDB, but not all of them; while Amazon keep the data safe for you, any further backup is only available by using Elastic MapReduce to write the data out to S3, and there’s still no snapshot or incremental backup feature.

How is it priced?

Read the pricing guide on Amazon’s web site the first time and tell me you don’t get a headache. As with most Amazon services, you are charged on a variety of metrics, including storage space and data transfer. Unlike Relational Database Service (RDS), where performance is intermittent and based on Elastic Book Store (EBS) performance, you reserve a certain amount of read and write capacity per second, and if you exceed those, you risk being throttled. Capacity is expressed in units of ‘1 item of 1KB/s’, with larger items consuming more units. You can easily change your throughput capacity using the API or management console. There’s a free tier, offering 100 MB of storage, 5 units of write capacity, and 10 units of read capacity. Assuming you’re dealing with items less than 1KB, then that translates to 5 writes and 10 reads per second. Would it be cost-effective to move to DynamoDB from another solution? The calculator can answer for the hard performance numbers, but the costs associated with reduced maintenance are always harder to account for. As with many Amazon services, it’s very cheap to start with – even the smallest reliable Cassandra cluster requires two full-time machines in different availability zones – and will scale up to the ends of the Internet with your growing use case, for a linear cost. The point at which the economies of scale mean running your own solution is cheaper will obviously vary per organization, but not needing developers, system administrators, or DBAs to worry about your big data is a very compelling proposition.

Can I perform analytics on my data?

Elastic MapReduce (EMR), Amazon’s hosted version of Hadoop, now supports DynamoDB. As well as analysis of the “how many people had a Kindle in their shopping cart and didn’t purchase it?” style, Amazon suggest you use EMR to back up your DynamoDB data to S3, and archive off tables that are no longer used.

Who’s using it?

Amazon say they have moved a number of internal services to DynamoDB already, including IMDB’s star ratings, and the Amazon Cloud Drive service, which should be seeing an increase in demand since the US launch of the Kindle Fire late last year. Private beta testers include photo-sharing website SmugMug, whose CEO, Don MacAskill, says it has “finally realize[d] its goal of being 100% cloud-based” by replacing a relational database with DynamoDB. Jeff Barr’s introduction post and the Getting Started guide both have simple examples, which could get you going on DynamoDB in under 15 minutes.

1668-Dynamo-Donuts-Coffee.jpg

2. Dynamo Donuts & Coffee, Scott Beale/Laughing Squid

What does everyone else think about it?

Basho, the authors of Riak, offered congratulations to Amazon on the release, calling DynamoDB “good for the NoSQL movement and great for customer choice”, and emphasized the fact that their product can run on other clouds and behind the firewall. Billy Bosworth, CEO of Datastax (a company providing commercial support for Cassandra) posted to suggest that other NoSQL databases are not the enemy, and anyone who helps people make the move from the much larger RDBMS market is considered a friend to them. They also posted a comparison between Cassandra and DynamoDB, in which the winner is, unsurprisingly Cassandra. With the power that comes from a well-established open source tool, they believe few of the early adopters of NoSQL are likely to move soon, especially those, such as Netflix (who replaced Oracle with Cassandra when they moved from datacenter to cloud), who want cross-regional replication. However, the turn-key nature of a managed system moved many people from MySQL to RDS, and no other provider can offer a service on SSD so the market is now wide open.

What else is interesting about DynamoDB?

DynamoDB is the first AWS service to run on SSD devices. For a key-value store, where no processing is required to fetch a row of data, disk performance is very important. Traditional relational databases often expend a lot of CPU time performing joins and calculating what data to return, and so the performance gains are not as obvious with improved disk performance. Being able to add things such as SSD disks is one of the advantages of building your own hardware over the commodity cloud. Running Cassandra on EC2 today means using spinning disks. Amazon proudly proclaim that they deliver new services based on customer demand, so if you want RDS and EBS volumes on SSD, now is the time to ask.

More reading

Amazon DynamoDB web site

Announcement from AWS evangelist Jeff Barr

DynamoDB press release

Launch video

Blog post from Amazon CTO, Werner Vogels

Blog post from Amazon Distinguished Engineer, James Hamilton

Notes about DynamoDB from Alex Popescu

Thoughts on SimpleDB, DynamoDB and Cassandra from Adrian Cockroft

Image credits

1. Steam powered dynamo photo by autowich on Flickr

2. Dynamo Donuts and Coffee photo by Scott Beale/Laughing Squid