Amazon Impresses Again with DynamoDB

It's hard to overestimate Amazon's influence on cloud computing and on NoSQL databases. 

Amazon Web Services (AWS) was the first and still is the leading concrete example of an infrastructure as a service (IaaS) cloud - a collection of cloud-based services such as compute (EC2), storage (S3) and other application building blocks.   

Many of these services build on the Dynamo key-value store - a distributed, non-relational database foundation for public services such as S3. Dynamo pioneered many of the architectural principles of the NoSQL movement, such as the consistent hashing model for distributing data, and the eventual consistency paradigm.

Amazon published details of the Dynamo system in 2007, and many of the non-relational databases that emerged over subsequent years are substantially based on the Dynamo design.  For instance, LinkedIn's Voldemort and Basho's Riak are based almost entirely on Dyanamo principles, while Apache Cassandra can be thought of as a hybrid of the Dynamo data partitioning scheme and the Google Bigtable data model.

However, Amazon's first attempt to provide a publically available elastic database system was not a complete success.   SimpleDB was one of the earliest additions to the AWS family - and one of the first cloud-based "NoSQL" solutions - but it never reached anywhere near the levels of adoption of S3 and EC2.  Although reliable and scalable in terms of throughput, it offered unexceptional service levels for individual transactions, and limitations on the maximum size of a single object.

Netflix was initially the largest user of SimpleDB: They famously used SimpleDB to move their on-premise Oracle database application almost entirely into the Amazon cloud.  However, Netflix eventually abandoned SimpleDB in favor of Cassandra (albeit still inside AWS). 

With the release of DynamoDB, Amazon has acknowledged the limitations of SimpleDB and provided an alternative that is based more directly on proven Dynamo technology.

DynamoDB implements a fully scalable and elastic data storage engine hosted in the Amazon cloud, based on Dynamo architecture.  However, unlike Dynamo but in common with NoSQL systems like Cassandra, data is represented in tabular format rather than as the binary object typical of pure key-value databases.   This provides a more familiar abstraction for many applications, and provides more potential for data integration with other systems.

DynamoDB currently is provisioned entirely on Solid State Disk (SSD) devices.  SSD devices can service individual read requests in a small fraction of the time of magnetic disk (typically about 100x faster), although the cost per GB of storage (10-500x) is much higher than that of magnetic disk.   Consequently, DynamoDB can provide very low latency and high throughput compared to alternatives, though at higher cost per unit of storage.  For instance, 1GB of DynamoDB storage costs $1/month, while S3 storage costs between 4 cents and 12 cents per GB/month (25-8 times cheaper).  

DynamoDB also charges a variable fee based on the IO capacity you wish to provide - 1,000 read IOs per second cost about 20 cents per hour, with write IOs being about 5 times more expensive (SSDs can perform reads much faster than writes). 

DynamoDB represents a new high watermark in terms of cloud-based, non-relational database technology.  The use of SSD and the proven elasticity of the Dynamo model should cause applications running inside the Amazon cloud to experience a quantum leap in terms of database throughput and elastic scalability.

For applications outside the Amazon cloud, the performance advantages are less decisive.  The low latency and high throughput provided by DynamoDB will be harder to realize across the internet, where network latencies in the 10s or 100s of milliseconds will mask the SSD speed advantage.   Nevertheless, the advantages of a zero-administration, massively scalable cloud database solution are undeniable.