<< back Page 2 of 3 next >>

DBMS 2020: State of Play

Hadoop in Decline

Hadoop represented a significant improvement in distributed processing for massive database workloads and upended the data warehousing market. But Hadoop has failed to maintain traction as enterprises move workloads into the cloud. Hadoop Distributed File System (HDFS) has stronger and more economical equivalents within each of the cloud platforms (S3 in Amazon Web Services, for instance), and the operational overhead of Hadoop is way higher than that of the cloud-based storage equivalents. 

Furthermore, while Hadoop delivered on its promise to provide economical storage for big data, it failed to provide a usable processing layer. The SQL layer built on top of Hadoop (Hive) was simply too slow and too limited. Spark—a sort of in-memory variation on the Hadoop theme—provided some relief and is still popular, both on-premise and in the clouds.

However, it seems that analytic workloads that were apparently destined for Hadoop are now likely to be held on a new generation of SQL–based data warehousing platforms. The data warehousing platforms of the on-premise era derived their advantage from the tight coupling between the software and hardware layers. The data warehousing platforms of the cloud era derive their advantage from elastic scaling and abstraction from the underlying hardware. SnowflakeDB is an example of such a database system. It’s fully SQL-compliant but completely cloud-native and capable of handling unstructured data as well as normalized data. It can approach the data volumes of Hadoop but with far lower operational overhead and far better performance.

For more articles like this one, go to the 2020 Data Sourcebook

The Graph Niche

Graph databases are highly optimized for workloads where the relationship between objects is as, or more, important than the objects themselves. Almost all general-purpose database systems support some form of graph processing. For instance, DataStax—the Cassandra company—is now the major contributor of code to the Gremlin open source graph language and is making graph a first-class citizen for Cassandra. Similarly, Cosmos DB from Microsoft supports graph as one of its standard data models and APIs.

However, the reality is that efficient graph traversal really does require that data be organized in a specialized graph format. For this reason, dedicated graph databases can expect to maintain strong traction.

Neo4J remains very popular as an embedded graph database engine. However, TigerGraph, which represents an attempt to build a more scalable and efficient graph engine, shows signs of strong early growth.

Database-as-a-Service Tipping Point

It almost goes without saying that we’ve passed the point of no return on cloud-based databases. All observers seem to agree that the majority of all new workloads for databases are now in the cloud, even while the vast majority of total workloads may be on-premise.

However, there’s cloud, and then there’s cloud. The weakest form of cloud is one in which a database system is running on the cloud provider simply by being launched within a cloud-based VM. Such a configuration may offer some cost savings but still requires significant operational overhead.

Fully managed cloud services take a database system that is available on-premise, host it in the cloud, and provide full operational and administrative services such as backup, optimization, and scaling. These offerings can become very attractive since they massively reduce the human costs involved in running a database. MongoDB has seen very vigorous adoption of its fully managed cloud service, and most database vendors are rushing to provide a similar fully managed service if they don’t already offer one.

Native cloud databases, built from the ground up to run in the cloud, and often without an on-premise option, can exploit both the operational and performance/scalability advantages of the cloud. These cloud-native databases—Microsoft Cosmos DB, Amazon DynamoDB, and Google Spanner—may offer the best scalability, economies, and operational advantages, but they do require that you commit your data to a particular cloud vendor.

Vendors that cannot at least provide a fully managed cloud offering or a full cloud-?native architecture will probably lose ground as database cloud migrations continue.

<< back Page 2 of 3 next >>


Subscribe to Big Data Quarterly E-Edition