The Database Landscape 2021

Developer surveys—particularly the OpenStack developer survey—confirm that the most popular databases are the open source databases, particularly PostgreSQL and MongoDB. Redis and ElasticSearch, though not strictly speaking databases by everyone’s definition, are also popular.

Cloud Maturity

As argued earlier, the future for database growth is in the cloud. In 2019, Gartner estimated that 75% of databases would be deployed in the cloud by 2022. Given the acceleration of cloud adoption brought upon by the pandemic, that figure could be even higher.

All database vendors had a cloud story as 2020 opened. However, in reality, these offerings were at very different levels of maturity.

I categorize “cloud” databases into four maturity levels:

  • Pretender: An instance of an on-premise database deployment running within a cloud-based virtual machine. Essentially, an on-premise database running on a VM that happens to be in the cloud.
  • Enabled: A scalable database cluster running on an elastic cloud infrastructure. In this scenario, the database isn’t built from the ground up to run in a cloud, but at least it’s able to add or remove nodes on demand to take advantage of elastic cloud resource provisioning.
  • Fully Managed: An “enabled” database cluster, where all management tasks such as backup and scaling are automatically managed by the infrastructure.
  • Native: A native cloud database is one that is built from the ground up to run in a cloud infrastructure. Typically, a native cloud database cannot run on-premise, unless within an on-premise private cloud.

The major cloud vendors all offer native cloud databases: Google (Spanner), Microsoft (CosmosDB), and Amazon (DynamoDB). Amazon and Microsoft also offer fully enabled cloud deployments of open source and commercial databases. For instance, Microsoft offers fully managed versions of PostgreSQL and MySQL.

The rest of the database vendors are sadly late to the party. Oracle and Datastax Cassandra have only offered fully managed cloud services within the last year and don’t provide a genuinely cloud-native alternative. Of the major vendors, only MongoDB hit 2020 with a mature, fully managed offering—its Atlas fully managed cloud service recently celebrated its fourth birthday.

Technologies to Watch

The jostling for cloud dominance is largely occurring between mature database technologies. Of the “top five” databases, only MongoDB can claim to have been invented in this century. The other four—Postgres, SQL Server, Oracle, and MySQL—were all invented in the 1980s and 1990s.

There is, however, still innovation in the database industry and many technologies worth watching. Databases that are designed primarily to synchronize data between mobile “sometimes connected” devices are gaining in popularity among developers. Strong contenders in this space include Google Firebase, MongoDB, and Couchbase.

Graph databases continue to grow in adoption. Neo4J continues to lead the market, while newcomer TigerGraph has some technical advantages and shows early promise. However, graph capabilities are being added to many existing database platforms, and the standalone graph database market may eventually be consumed within the larger platforms.

A variety of next-generation databases designed to exploit modern cloud and application architectures have emerged in the last few years. NuoDB and CockroachDB are OLTP SQL databases that are designed from the ground up to exploit the container architectures of Docker and Kubernetes and to run more effectively in cloud environments. SnowflakeDB is a cloud-native SQL database which focuses on analytic workloads.

SnowflakeDB is benefiting from the declining significance of Hadoop. Although Hadoop was instrumental in pioneering the big data era, it failed to remain relevant as companies migrated to the cloud. However, Spark, an in-memory Hadoop-like distributed datastore, is still widely adopted.

What’s Ahead

I anticipate the next few years to be one of consolidation in the cloud; the database vendors that have been late to implement fully managed cloud services will race toward that goal, but struggle for growth until they get there. Companies with well-established, fully managed cloud services are well-positioned for growth.

There are few signs of a technological paradigm shift on the horizon. If quantum computers become mainstream, then all aspects of computer science will be revolutionized, and this includes databases. However, it seems unlikely that we’ll see much commercial impact over the next few years. And, while work continues on advanced storage devices that could blur the distinction between disk and memory, the universal memory revolution does not appear to be imminent. Blockchain technology—with its immutable data histories—holds the promise of being able to create tamper-proof and trustable databases. Startups such as Flur.ee and my own ProvenDB are offering blockchain database variations, while Oracle’s blockchain tables and Amazons QLDB offer pseudo-blockchain capabilities.

However, as businesses struggle to establish a COVID-normal business model, the key drivers are the ability to scale, support telecommuting, and optimize spend. To meet these challenges, database buyers are looking to cloud databases as the answer.

