For all of the 1990s and most of the 2000s, database management was synonymous with the relational database. By the end of the 1990s, the relational database management system (RDBMSs) used the SQL language, relational data model, and ACID transactions to provide a one-size-fits-all-solution for data management.
It’s been clear for at least 5 years that the relational database—though still the dominant model—is not the only game in town. Starting around 2008, a variety of “NoSQL,” “NewSQL” and “big data” technologies emerged that are tailored to more specific workloads. These include big data technologies such as Hadoop and Spark, operational “web scale” NoSQL databases such as MongoDB and Cassandra, and “NewSQL” systems such as HANA and Vertica.
The suddenness of this non-relational “breakout” created a lot of noise and confusion and—at least initially—an explosion of new database systems. However, the database landscape is settling down, and in the past few years, the biggest meta trend in database management has been a reduction in the number of leading vendors and consolidation of core technologies. Additionally, we’re starting to see database as a service (DBaaS) offerings become increasingly credible alternatives to on-premise or do-it-yourself cloud database configuration.
The Story So Far
The combination of the relational model of data, the atomicity, consistency, integrity, and durability (ACID) model for transactions, and the ubiquitous SQL language has allowed the RDBMS to satisfy the requirements of virtually all workloads from the client/server era through to the early internet.
However, in the first decade of the 21st century, a new generation of computer applications emerged. Enabled by the now universal wide area network of the internet, these applications were global in scope, demanded continuous uptime, and generated unparalleled transaction rates and user populations. Ecommerce sites such as Amazon, social networks such as Facebook, and data aggregators such as Google, have all found the RDBMS completely unable to support these new workloads.
Following the launch of the iPhone and the subsequent widespread adoption of smartphones, all consumer-facing entities found that they needed to provide both a web front end and a mobile experience—supported by a common “cloud” infrastructure. Furthermore, as consumers for the first time adopted a full-time internet presence—particularly through the adoption of Facebook—it became advantageous to integrate with customers’ social profiles. Lessons learned from Google and Amazon also demonstrated that collection and analysis of massive amounts of fine-grained data could be the key to competitive advantage.
This modern application architecture is sometimes referred to as “SMAC” for social, mobile, analytics, and cloud. Elements of SMAC can be found in almost all new applications and in most of today’s dominant software companies.
This also escalated the pressure to accelerate software delivery. Rather than releasing a couple of shrink-wrapped versions of software each year, cloud vendors found that they could release far more frequently or even continuously. This continuous deployment model was incompatible with older software lifecycles, leading to a blurring between development and production and the emergence of the “DevOps” movement.
The competing demands of SMAC and DevOps could not be satisfied—at least initially—by a single database architecture. New approaches rapidly emerged:
- Analytics—aka “big data” and “data science”—demanded an economic means of storing huge amounts of sometimes unstructured or semi-structured data. Hadoop—essentially an open source version of technologies pioneered by Google—emerged to fill this niche. Spark—simplistically an in-memory variation on Hadoop—allowed more high-performance and rapidly iterative analytic workloads.
- Amazon’s Dynamo database allowed Amazon to provide a highly available global ecommerce experience. Amazon provides Dynamo as a service, while Apache Cassandra provides an open source database that includes core dynamo paradigms. Dynamo is the most notable example of the eventual consistency model—sacrificing strict consistency in favor of availability and performance.
- Document databases such as MongoDB and CouchBase are tightly coupled with the overall web development stack. Because their schema is entirely defined by application code, they fully support continuous deployment and agile development practices.
- Graph databases—such as Neo4J—excel at modeling data when the relationships between entities are as significant as the entities themselves. Graph engines are increasingly being integrated into non-relational and relational systems to allow graph processing of data in other formats.
- These new generation databases are comfortable running inside a cloud. However, there are also “cloud native” databases which don’t support on-premise deployment at all. Amazon DynamoDB, Microsoft Cosmo DB, and Google’s Spanner are examples of systems which are designed from the ground up to provide elastic scaling and other cloud characteristics.
None of this is to say that the RDBMS has lost relevance. Indeed, by any measure, commercial RDBMSs such as Oracle and SQL Server are still the most widely deployed databases with by far the highest associated revenues. However, the RDBMS market is relatively stagnant, and the focus of innovation and growth is in the next-generation databases listed above.
Increasing Market Consolidation
Back in 2008/2009, it seemed that every week brought a new “revolutionary” database system. It was obvious then that not all of these systems would stand the test of time. This year, in particular, we’ve seen notable growth from some of the bigger vendors while some of the smaller vendors are falling off the map.