With Cosmos DB, Microsoft Attempts a ‘One-Size-Fits-All’ Database

After almost a generation of relative stability, database technology has been rocked over the past decade by two megatrends—the end of the one-size-fits-all RDBMS model and the rise of cloud computing.

For roughly 20 years (1988 to 2008), the RDBMS database reigned supreme. The RDBMS architecture rested on three pillars: the relational data model, the ACID transactional model, and the SQL language. However, the RDBMS proved unable to power global-scale “always-on” websites such as Amazon, Google, and Facebook. By the end of the last decade, a myriad of alternatives such as Hadoop, Cassandra, and MongoDB, collectively known as “NoSQL” databases, had emerged.

Each of these next-generation databases solved a different problem. Hadoop provided an economic way of analyzing massive amounts of unstructured “big data.” Cassandra provided a way to power a transactional application at an immense scale. MongoDB aligned the programmer’s data model with the database, allowing for easier agile development and continuous integration. Meanwhile, the RDBMS remained extremely prevalent if no longer ubiquitous.

This embarrassment of choices has made it hard for database users to consolidate workloads on a single technology. Most enterprises now must support a variety of database types, typically including a variety of RDBMS and NoSQL systems.

With Cosmos DB, Microsoft is explicitly trying to provide a cloud-native database service that offers support for multiple workload types within a single platform. Cosmos DB is an Azure-native cloud system that supports multiple consistency levels, data models, and APIs.

Cosmos DB represents the end point of a decade of evolution within the Azure cloud platform. When Azure was first released in 2010, it supported multiple data storage solutions: a “blob service” similar to Amazon’s S3 and a “Table service,” which was essentially a key-value store. A third service, DocumentDB, added JSON storage. Meanwhile, a variety of SQL services based on Microsoft SQL Server were also available.

However, as far back as 2010, Microsoft was designing a next-generation database service that would be capable of supporting virtually any workload. Turing Award winner Leslie Lamport—as famous as any computer scientist within distributed systems—was included within the team.

At the heart of Cosmos DB is a geo-distributed and replicated database. Data is stored in containers that can be partitioned across regions. The partitioning can be adjusted dynamically by the system to optimize throughput. The partitioning can also be tuned to provide geographic optimization: Data specific to Europe can be located primarily in European data centers, for instance.

Cosmos DB offers uniquely configurable options across consistency models, data models, and APIs.

Consistency ranges from strong to eventual, across five graduations in total. Strong consistency leads to slower and more robust applications. Eventual consistency allows for very rapid transaction processing with the possibility of transitory inconsistencies.

Data can be presented to the user as key-value, a column family table, a graph, or as a JSON document. Note, however, that the “table” model is not a relational table, rather, it is the wide-column family model popularized by Cassandra and HBase.

Each of these data models supports different APIs. The JSON document model can be queried using a MongoDB compatible API. The wide column format supports the Cassandra API. Graph storage can be traversed using the popular Gremlin graph language. There is also a SQL language implementation that can navigate through complex JSON documents.

Cosmos DB is a legitimate attempt to build a cloud-native database that provides support for multiple data models, APIs, and transactional modes. To date, it does not seem that Cosmos DB is gaining mindshare—it ranks at number 25 on the database ratings, for instance. However, Microsoft is known for its persistence and patience; the company will continue to invest in Cosmos DB for years to come. Over time, it may be that Cosmos DB emerges as a leading cloud database system.