Page 1 of 3 next >>

DBMS 2020: State of Play

Prior to 2008, whatever your database question, the answer was always,“Oracle”—or sometimes, “MySQL” or, “SQL Server.” The relational database management system (RDBMS) model—really a triumvirate of technologies combining Codd’s relational data model, the ACID transaction model, and the SQL language—dominated database management systems completely.

However, around 2008–2009, a plethora of new database systems emerged, none of which closely followed the RDBMS model.

The straw that broke the back of relational dominance was the inability of RDBMS to satisfy the needs of the largest web companies—Google and Amazon.

At Google, the sheer volume of data involved in indexing the World Wide Web led the company to develop new approaches to massive data storage such as the Google File System (GFS) and MapReduce. These technologies led the open source community to develop Hadoop. Hadoop seriously challenged the enterprise data warehousing segment of the RDBMS market.

At Amazon, the need to maintain an “always-on” transactional system was at odds with the consistency model of the RDBMS. Amazon developed its own “eventually consistent” DBMS—Dynamo—which is now commercialized as DynamoDB. The Dynamo model inspired Cassandra and several other early NoSQL database systems.

For more articles like this one, go to the 2020 Data Sourcebook

In addition to these “NoSQL” offerings, a variety of “NewSQL” databases also emerged. Some of these were inspired by research conducted by DBMS veteran Mike Stonebraker. In particular, a new breed of “columnar” storage systems—particularly well-suited to analytic workloads—such as Vertica and HANA emerged.

Once the dominance of the RDBMS was broken, it became possible for niche, specialized database systems to take root. Graph databases, such as Neo4J, and document databases, such as MongoDB, rapidly gained traction.

In the subsequent 10 years, some of the NoSQL and NewSQL entrants have flourished but more have disappeared. More and more databases are now hosted on cloud platforms, and we can see movement toward both consolidation and diversification.

RDBMS Remains Dominant, Though Stagnant

The RDBMS still remains the dominant platform for database applications by far, both in terms of mind and market share. The Oracle DBMS is the most popular DBMS, and MySQL (also owned by Oracle Corp.) is the second most popular database (www.statista.com/statistics/809750/worldwide-popularity-ranking-database?-?management-systems). The top four most popular databases are generally reckoned to be Oracle, MySQL, SQL Server, and Postgres (although Postgres and MongoDB are neck-and-neck in many polls, such as one by DB-Engines, https://db-engines.com/??en/ranking). 

However, things are not all rosy in the RDBMS world. IDC and Gartner generally predict only modest growth for RDBMSs—at least those running on-premise. Furthermore, the popularity of the RDBMS among software developers—generally a very good leading indicator of long-term technological market share—is dismal. According to the Stack Overflow Developer Survey (https://insights.stackoverflow.com/survey/?2019#technology), Oracle is the second-least-loved and second-most-dreaded database technology. Microsoft SQL Server does little better. 

Developers generally have a big say in the frameworks for emerging applications, and they seem to be abandoning the major RDBMS platforms.

The NoSQL Race Narrows

A huge number of “NoSQL” databases emerged following the breakout of 2008/2009. In the past 10 years, most of these have disappeared, leaving a small number of NoSQL front-runners.

MongoDB, of course, is by far the most popular NoSQL database, both in terms of market share and developer enthusiasm. It remains the default choice for website development and an increasingly popular choice as a general purpose database. 

Cassandra has a much smaller numeric footprint but has a strong beachhead in very large deployments. It retains very strong mindshare and solid revenue base.

Both MongoDB and DataStax (the commercial face of Cassandra) are concentrating on consolidating their existing market niches and expanding their applicability to broader use cases. Neither seems to be planning any kind of a disruptive technological advancement. They see their opportunity in capturing new application workloads and in migration from what they consider to be the “legacy” RDBMS vendors such as Oracle.

Distributed SQL

As mentioned earlier, NoSQL arose out of a desire to achieve greater scalability by sacrificing some elements of consistency, and the SQL language was an incidental sacrifice toward that objective.

However, at Google, a group of engineers believed that NoSQL had made the wrong compromise. Rather than sacrificing consistency to achieve guaranteed scalability and availability, it might be possible to reduce the chance of an availability failure to almost zero, while maintaining strict consistency. The result was the Google Spanner database, which uses highly redundant networks and atomic clock synchronization to provide a database service that is completely consistent and—almost—completely available.

CockroachDB was inspired by the Spanner project and offers a SQL database with strict consistency and virtually unlimited scalability together with very high—if not perfect—availability.

NuoDB shares a similar vision, albeit one less directly influenced by Spanner. Unlike CockroachDB, NuoDB separates compute and storage nodes, which allows compute resources to be elastically provisioned without having to redistribute database storage.

Both NuoDB and CockroachDB are well-positioned to exploit the emerging paradigm shift created by containerization. Docker and Kubernetes have revolutionized software architecture by allowing the abstraction of hardware and operating system resources within a distributed system. The result has been a quantum leap in the ability of software engineers to create distributed systems. However, most legacy databases cannot fully exploit frameworks such as Kubernetes because of an underlying monolithic architecture.

The fully distributed databases, such as NuoDB, CockroachDB, and Cassandra, are well-positioned to work with Kubernetes and exploit an increasingly containerized future.

Page 1 of 3 next >>


Newsletters

Subscribe to Big Data Quarterly E-Edition