Amazon’s Aurora: A Distributed SQL Database Alternative For MySQL Applications

SQL-based distributed Cloud Relational databases aren't new, but Amazon's Aurora offers an alternative to SQL Azure, and, being MySQL-compatible, provides the obvious route to the cloud for hosted LAMP web applications. Is this of interest to the rest of us? Rob Sheldon investigates.

In November, during the annual AWS re:Invent conference, Amazon announced a new addition to its line of Relational Database Service (RDS) offerings-Amazon RDS for Aurora. Heralded as the harbinger of a new age of database as a service (DBaaS), Aurora was built from the ground up with simplicity, agility, and affordability in mind, leaving the rest of us to try to make sense of the implications.

According to all the marketing hoopla, Aurora combines the speed and availability of a high-end commercial database into a service that’s easy-to-use, quick to implement, and built for the cloud. Aurora appears on the scene in true Amazonian splendor, with expectations running high that Aurora will take on not only MySQL wherever it resides, but also such giants as SQL Server, DB2, and the ever-ubiquitous Oracle Database.

Move Over MySQL

In 2009, Amazon launched RDS by serving up MySQL as a service in the cloud. Like any provider offering software as a service, Amazon took care of all the grunt work associated with setting up and managing the MySQL environment. Subscribers could get started with a few point-and-clicks and relatively little, if any, up-front costs.

Since the initial RDS launch, Amazon has added an assortment of options, high-availability features, and three more database engines: SQL Server, PostgreSQL, and Oracle Database.

For organizations ready and able to offload the resource-intensive headaches that come with implementing and maintaining their own systems, services such as RDS can prove a welcomed relief. Amazon handles all the patching and provisioning and day-to-day maintenance, performs backup and restore operations, as well as provides failure detection, and addresses ongoing repairs. All subscribers have to do is pay their monthly fees and they’re good to go. When compared to the complexities and licensing nightmares that often accompany on-premises implementations, RDS can seem an enticing solution indeed.

As good as all this sounds, however, the fact remains that database engines such as MySQL and SQL Server were designed to run within constrained hardware and software environments, conforming to inherent limitations in processing and storage. Such systems were not built to maximize the benefits of cloud computing with its shared resource architecture and ability to run on commodity hardware (although this appears to be slowly changing, at least somewhat).

To address what Amazon perceived as a need for a more cloud-friendly database engine, the company spent three years developing its own relational database and came up with Amazon RDS for Aurora, the latest member of the RDS family.

Amazon has been touting Aurora as a commercial-grade database engine at an open-source price. Secure, scalable, fault tolerant and MySQL-compatible, Aurora is built on a service-oriented and multi-tenant architecture, offering performance and availability comparable to high-end commercial databases.

According to the Amazon, R&D started with a clean slate when designing Aurora, building the engine and storage layers from the ground up specifically to maximize the benefits of a cloud environment. The new design, so we are told, can fully leverage the cloud’s compute, storage, network, and software resources in order to handle demanding database workloads and eliminate bottlenecks caused by locking and I/O waits.

In preparation for Aurora’s official debut, Amazon has been bandying about the name MySQL with particular enthusiasm, promising feature compatibility with version 5.6 and seamless integration with existing MySQL applications. Migration, according to Amazon, requires little more than a few clicks and will result in unprecedented gains in throughput, up to five times the run-of-the-mill MySQL installation.

On the surface, such figures sound quite impressive, but specific details about the MySQL installation and benchmarks has been somewhat lax, other than referring to the testing environments as “standard.” Amazon provides even fewer details about comparisons to other database products, except to say that Aurora is joining the ranks of those “commercial-grade” database giants.

The Amazon Aurora Promise

Given that Aurora is still in a limited preview, it’s difficult for many of us to get a close look. Currently, only people in the eastern US can take the engine for a test drive. The rest of us must rely on hearsay and Amazon’s own descriptions-or promises-about performance, storage, availability, security, and usability.

On the performance front, Amazon claims to have clocked Aurora at over 30 million selects per minute, with updates coming in at about 6 million per minute. Subscribers who don’t need all that power out of the gate can start out small and then scale up their database instances as needed-up to 32 virtual CPUs (vCPUs) and 244 GiB of memory. And they can just as easily scale down, with such changes taking effect within minutes.

Each Aurora database is tightly integrated with an SSD-based virtualized storage layer designed specifically for database workloads. The storage is fault-tolerant and self-healing. The service automatically repairs disk failures, all in the background, without disrupting database availability. When subscribers first provision their Aurora databases, they start with only 10 GB of storage. Aurora then automatically scales up storage in 10-GB increments on an as-needed basis, up to 64 TB (or to the defined maximum, if less than 64 TB).

Aurora divides data into 10-GB segments stored on multiple disks and replicates each segment across three Availability Zones (AZs), with two copies in each zone. With this system, Aurora can handle the loss of up to two copies of data without impacting database write capabilities and up to three copies without impacting read capabilities. The service continuously scans the disks for errors and replaces the disks automatically if any are found.

Amazon also automatically backs up the data in a way that does not impact performance. If failure should occur, users can restore their database to a specific point in time, as long as it’s within the retention period, which can be configured for up to 35 days. Subscribers can also generate database snapshots and store them on Amazon S3, where they’re retained until they’ve been specifically deleted.

In addition, Amazon lets customers add up to 15 replicas to their databases in order to achieve higher levels of availability and distribute read operations across multiple instances. Aurora uses the RDS Multi-AZ service to manage the replication operations and to automatically failover a databases should the primary instance go down, regardless of the zone where the database resides. Failover usually takes only a few minutes. Even if replication hasn’t been implemented, the service still tries to automatically create a new database instance, which can often be completed within 15 minutes.

For security purposes, each database runs in an Amazon Virtual Private Cloud (VPC), which RDS automatically creates when the database is first configured. The VPC isolates the database in a virtual network that users can connect to via their organization’s encrypted IPsec virtual private networks (VPNs). Amazon can also encrypt data both in-transit and at-rest, which includes data written to the Aurora storage as well as Amazon A3 backups. Aurora is also integrated with the ASW Identity and Access Management service for controlling access to Aurora resources and data.

When compared to on-premises database systems, provisioning and managing an Aurora database is extremely easy. Most operations can be achieved with a few clicks in the AWS Management Console. Amazon also plans to add management support via new APIs, the AWS Command Line Interface, and AWS CloudFormation templates. In theory, users should be able to get an Aurora instance up and running within minutes. From there, they can manage the instance through the console and view over 20 operational metrics about compute, memory, and storage resources as well as details about the active connections, query throughput, and cache hit ratio.

An Industry in Flux

Although we’ve yet to see whether Amazon will be able to deliver on all its promises, particularly where performance and availability are concerned, there can be little doubt that the cloud behemoth is making a serious bid for its piece of the relational database pie. What this means for other database providers, whether on-premises or in the cloud, is yet to be seen.

A few things we do know, however. Aurora is simply a database engine and supporting service. We don’t get all the bells and whistles that come with a product such as SQL Server. There’s no Integration Services, no Analysis Services, no Reporting Services, and no integrated self-service BI tools.

Even organizations that care nothing about these extras, wanting only a powerful and reliable database engine, still might not be ready to head into the cloud, whether for security reasons or compliancy reasons or general lack of control.

For many organizations, however, their primary concern is cost. Forgetting for a moment all the promised features that Amazon has so carefully laid out for it database brainchild, let’s take a look at the service’s pricing.

Aurora proselytizers are quick to point out that you pay for only what you use-no long-term commitments or upfront costs or complex licensing structures. Subscribers pay by the hour for each database instance, whether the primary database or a replica, without the complexities and long-range planning that go into an on-premises system. Basic service for a single instance starts at 29 cents per hour for two virtual CPUs and 15.25 GiB of memory, plus 10 cents per GB of storage per month as well as 20 cents for each million requests to cover I/O consumption. Users also have to pay for storing any database snapshots they might generate.

The subscription prices drop significantly if you commit to one or three years and pay an upfront fee. For example, if you were to choose the most basic service, without replication or snapshot storage, and commit to three years, you’d be looking at around $1600 for that period, plus an upfront charge of $1250 as well as storage and I/O fees. If you average 250 GB of storage over the three years, it should run about $900. Estimating I/O gets a bit trickier, given the variables that can be involved, but even at 120 million requests per month, you’re looking at less than $900 for the three years. So a basic database operation should come in at a bit over $4600, plus any applicable taxes.

Of course, you can implement a MySQL database in-house without spending anything, at least not for the licensing costs. But there are hardware, software, infrastructure, and personnel resources to consider. With all that, a cloud solution might start to look fairly appealing, especially for smaller businesses without the resources and expertise to back them up.

If we take our comparisons a step further and pull the pricier commercial products into the equation, not only do we have the licensing costs to contend with, but also their complex and often confusing structures. One wrong move can result in software audits that end up costing an organization thousands of dollars. Yet even if we stay on the straight and narrow, commercial licenses don’t come cheap. SQL Server 2014, for example, starts at over $7000 for the Standard edition, using per-core licensing.

Of course, if you’re server is supporting 10 to 20 databases and you’re in for the long haul, an in-house solution might still be the way to go, even using a commercial brand, rather than paying the unending subscription fees that go with a cloud provider. That said, a discussion about in-house implementations compared to cloud services is beyond the scope of this article. Such decisions are best made after a careful cost and services analysis that weighs all options and takes into account the many variables that go into a database implementation.

Even if we were to compare Aurora to other DBaaS offerings, such comparisons are no simple task. Each provider has its own way of packaging and pricing its database services and there’s seldom an easy way to line one up next to the other. But let’s give it a try.

As noted earlier, a fairly basic Aurora implementation should run about $4600 for three years. Now let’s consider Google’s Cloud SQL. The D4 tier provides only 2 GB of RAM but includes up to 4 million I/O operations a day. To get this service for three years with 250 GB of storage, it will run close to $7000, at today’s rates. Azure SQL Database, on the other hand, is offering the Standard service tier at the S0 level for under $600 for three years, again, at the current rates. The Standard tier at this level provides 250 GB storage, supports up to 521 transactions per minute, and can use up to 10 database throughput units (DTUs)-the relative capacity of the system’s performance.

To add confusion to the mix, consider the pricing for other Amazon RDS services. RDS for MySQL includes the large, standard instance as one of its option, similar to Aurora’s basic service. However, the MySQL version will run only $3300 for that three-year period, if taking the Medium Utilization option. A comparable service tier on RDS for Oracle could easily exceed $6,500 for that same three years, including the Oracle Database licensing fees.

But again, comparing services is no small task, and pricing is only part of the equation (and a confusing one at that). Performance, availability, fault tolerance, security, and usability are also important considerations. If we simply compare RDS for Aurora to RDS for MySQL, we can at least point to Amazon’s own market-speak about Aurora’s superior performance capabilities, giving us some understanding for the cost differences between two. At the same time, Amazon has also been quick to point out Aurora’s MySQL compatibility, no doubt in the hope of leading some organizations to choose Aurora over on-premises MySQL, especially if cost-savings can be realized.

What Amazon has not yet fully explained is why anyone should choose Aurora over Azure SQL Database or Google Cloud SQL or some other service, let alone an on-premises commercial product. Those details, I’m assuming, are yet to come.

Too Soon to Tell

Much of the press that has surrounded the announcement about Aurora’s imminent debut has suggested that Amazon is not only moving into MySQL territory, but also into regions governed by big league players such as Oracle, Microsoft, and IBM, and Amazon has been quick to supply us with performance benchmarks that rival those of Aurora’s commercial counterparts. But all we’ve really gotten out of Amazon are hints of how Aurora is unlike other databases, with few specifics that take on the competition.

At first glance, it might appear that Aurora is primarily competing against RDP for MySQL, offering a more robust service at a higher price. No doubt Amazon has more in mind, at the very least, to convince organizations to give up their on-premises MySQL databases to embrace the Aurora cloud. However, only if Aurora is able to deliver on its promises for faster and better and more reliable will Amazon be able pull in customers from other cloud services and the big name commercial products.

Clearly, it’s too soon to ring the death knell on SQL Server and others of its ilk, but Aurora is still likely to send ripples throughout the relational world, given the impact Amazon has had on other cloud-based endeavors. Whether we’re looking at a tsunami that will take out the old guard or at a mere shift in the tide is yet to be seen. But change is indeed the air, and we’ve yet to feel the cloud’s full impact on the database world as we know it.