A Key Open Source Database Emerges in Key-Value NoSQL

When NoSQL first hit the IT consciousness in 2009, an explosion of NoSQL databases seemed to appear out of thin air. Some of these contenders had in fact been around for some time, with others thrown together rather quickly to exploit the NoSQL buzz.  Over time, the NoSQL pack thinned out as leaders in specific categories emerged.  For document-oriented databases, in my opinion, the clear leader is MongoDB; for big table-style databases, both Cassandra and HBase contend for leadership; while in graph databases, Neo4J remains the undisputed king.

For some time, there was no clear leading key-value NoSQL database.   Key-value stores place no restrictions the structure that can be stored for a given key, and, correspondingly, provide no meta data that can help an external application to decode that data.  It’s entirely up to the application to decide what data gets stored and how it should be read.  This truly schema-less model allows for some fairly simple implementations, and many rudimentary implementations of key-value emerged during the early NoSQL land grab.

But many key-value stores were able to leverage Amazon’s pioneering early work on key-value store architecture.  In 2007, Amazon published the now famous Dynamo paper, which described how Amazon’s internal key-value stores handled data distribution, consistency and concurrency.  It was Dynamo that popularized the “eventual consistency” concept and introduced many widely adopted techniques, such as consistent hashing.   Subsequent key-value NoSQL databases were generally based on Dynamo – including Voldemort from LinkedIn, Cassandra at Facebook, and many now defunct early NoSQL databases.

Riak emerged in 2008 as an open source Dynamo implementation sponsored by the Basho Company.  The initial release of Riak in 2009 was immediately popular with software developers looking for a simple key-value system, and established some high profile early adopters such as Comcast and Citigroup.

Basho bolstered the base Riak open source system with a commercial enterprise distribution that supports multi-dataset replication.  Recently released Riak CS is an Amazon Web Services (AWS) S3 compatible storage engine.  Amazon S3 (Simple Storage Service) provides a generic storage mechanism for Amazon web service users, and Riak CS implements that API.

Riak databases are created from a cluster of symmetrical servers – unlike HBase, there is no master node and, therefore, no single point of failure.  Machines in the cluster may vary in configuration, however, allowing the cluster to take advantage of the increased processing power in newer, more modern additions. 

The Riak data model is very simple.  Riak “buckets” provide a namespace for storing related stuff – usually, but not always, you put like objects (Customer, product objects, etc.) inside the same bucket.  Inside a bucket, keys are associated with specific values that are defined at the program level.  There’s no defined structure – the value can be binary data, XML, JSON, or any sort of file format.

Like all dynamo derived databases, Riak sacrifices strict consistency for scalability and high availability.   However, the Riak administrator or developer can tune the consistency model to balance performance, availability or consistency.

In addition to key-value lookups, Riak supports MapReduce for batch processing across all objects in a bucket, and can integrate with SOLR for searching for text within a bucket.  There is also a mechanism for adding secondary indexes that allows quick lookups for specific values that cannot be identified through the primary key.

Over the past few years, Riak has emerged as the most widely deployed and richest implementation of the Amazon Dynamo-style of key-value store.   Other types of NoSQL stores provide richer data models and other advantages, but Riak represents a very strong leader in the NoSQL market.