MongoDB 5.0: Worth the Wait

It's been 3 years since MongoDB 4.0 was announced at the 2018 MongoDB World conference—and that is a long time in the software industry.

Transactions—introduced in MongoDB 4.0—arguably represented the final "missing" piece of the MongoDB puzzle—the last must-have feature to allow MongoDB to be regarded as a grown-up database suitable for production enterprise deployments.

Since then, MongoDB has continued to show steady and impressive growth, and each minor release of MongoDB 4.0 has introduced new and important features to the platform. But given the breadth of the existing portfolio, it's been hard to identify an obvious "next big thing" for MongoDB 5.0.

Now that MongoDB 5.0 is announced, we can see that indeed, there is no single flagship feature, but instead a diverse set of strong new capabilities to broaden the workloads that MongoDB can support.

Time Series Support

Perhaps the most notable technical innovation in 5.0 is native time series support.  Many applications insert data that is highly time-oriented.  These Time series datasets are typically subject to very distinct analytic queries involving trending and aggregation across time boundaries. MongoDB's time series collections have optimized internal storage to facilitate these workload patterns.

When creating a time series collection, the user nominates the expected granularity (minutes, hours, etc.) of the data to be ingested. Under the hood, MongoDB organizes data into time series buckets with compression, pre-computed aggregations and optimized index structures to facilitate efficient date-oriented queries. Old time series data can also be configured for automatic purging or—for Atlas users—archived out to low-cost storage.

In the initial release, time series collections cannot be sharded or manipulated within transactions—I expect those limitations to be removed in upcoming releases.

Analytic "Windowing" Functions

Closely aligned with time series are the introduction of new analytic "windowing" functions within the aggregation framework.  Windowing functions will be familiar to SQL users—they allow functions to operate over a "window" of documents in some ordered dataset. For instance, windowing functions allow you to compare the current value with the average value for the last 24 hours or to calculate a moving average.   

While particularly useful for time series data, Windowing functions have very broad applicability in analytical contexts and can be used across any collection type. I expect to see MongoDB charts introducing strong new analytic visualizations based on the Windowing functions. The ability to detect trends in data can also be used to build self-learning adaptive applications.

Live Resharding

Sharding allows a write-intensive workload to be scaled out across multiple replica sets or to geographically distribute an application across multiple regions. However, it's historically been very difficult to change the shard "key" for an existing cluster. In MongoDB 5.0, the shard key can be modified without downtime.

This ability to redefine a shard key online will be welcomed by those who maintain larger distributed systems and in particular, those who want to transition a large cluster to a geographically distributed topology. While this is definitely of use to a smaller subset of users, those users are the ones running the largest and presumably most expensive MongoDB deployments today.

Release Cadence and Versioned API

MongoDB has announced changes in the frequency and nature of releases and are providing a mechanism to provide long term support for older releases.

From 5.0 onwards, MongoDB are committing to a major release every year and quarterly Rapid Releases. However, while minor releases will be available for download, they will only be certified for production use on Atlas. This may cause consternation amongst users of the community edition or those running on-premise. However, MongoDB points out that in order to achieve a high frequency of releases, they need to reduce the amount of testing that must be performed on all possible hardware platforms. Furthermore, these rapid release are available for evaluation and development purposes—I’d expect them to be widely deployed in non-critical contexts.

The versioned API is intended to allow developers to seamlessly upgrade the backend database to the latest version of the database, while maintaining app compatibility. As a result, the overhead of migrating an application to a new version of the database should be radically reduced. It’s inevitable that MongoDB will eventually End of Life old versions, and that forced upgrades to new versions are not always pain-free. In the future, these upgrades will take significantly less work as the database drivers and database will be able to continue to support previous major releases even when the back-end database is upgraded.

It remains to be seen how smoothly this will work in practice, but the intention of allowing longer-term support for older versions of database functionality is welcome.

Serverless Atlas

Serverless Atlas provides on-demand access to Atlas cloud database services.  Rather than configuring a cluster to a fixed size, Atlas will provide truly dynamic resources to your application requests—scaling CPU and Memory as required to deal with the workload and discarding those resources when the request completes.

This configuration will suit applications that have infrequent bursts of activity, rather than continuous and predictable transactional traffic, but it does suggest a future mode in which Atlas could be a truly elastic DBaaS in which only resources that were actually applied to useful work would be billed.

Other Goodies

As always, there's a collection of incremental features that improve on already existing features:

  • Client-side Field Level Encryption now integrates with the Key Management Systems (KMS) provided by Google and Microsoft Azure. Previously this feature natively integrated only with the Amazon AWS KMS. 
  • Data scientists using Python will welcome the PyMongoArrow API, which converts MongoDB query results to python formats popular in machine learning and statistical analysis.
  • Schema validation will finally provide useful diagnostics. Previously, a failed validation would provide no information as to what rule in the schema had been violated, a really annoying experience! 
  • The new MongoDB Shell—mongosh—has now reached General Availability. This shell includes many improvements over the traditional shell, including syntax highlighting and extensibility.
  • Long-running queries within a transaction can now extend to five minutes by default or longer if configured. Previously, transactional statements were limited to just one minute. Under the hoods, transaction snapshots are now written to disk storage rather than memory, allowing for larger transaction windows.

Looking forward

MongoDB's latest marketing/architecture slide gives a hint toward the forward strategy of the company. It emphasizes the document model as the core of the product line, but provides a unified interface across other data models—such as graph, relational, etc.  It also illustrates multiple application types and workloads—search, mobile, transactional, and so on—integrating with that unified API. Timeseries can be seen as a step into that world of broader workload support.

It's worth noting that while MongoDB Atlas remains a cornerstone of MongoDB's revenue strategy, the company seems to have resisted the temptation to make these new features Atlas-only. To be sure, there are many features added to Atlas, but generally, these are features that only make sense within the fully managed Database as a Service paradigm. Community Edition users can take full benefit of the majority of new features.

MongoDB's position as the leading modern and non-relational database seems secure for the time being. However, distributed SQL systems such as Yugabyte and CockroachDB are growing in adoption and largely competing for the same enterprise workloads. In the cloud, MongoDB is competing with the mega cloud vendors such as Amazon and Microsoft, each of whom are constantly trying to entice would-be Atlas users to their corresponding offerings. MongoDB needs to navigate carefully to maintain its dominance amongst developers while simultaneously offering economically compelling commercial offerings. MongoDB 5.0 seems well-positioned to continue to drive MongoDB adoption—both with developers and the enterprise.