MongoDB 4.2 Boasts Incremental Improvements

Like most software companies, MongoDB aligns the launch of new releases with their annual user conference. This year, MongoDB is unveiling only a "minor" release: version 4.2.  As a minor release, 4.2 is full of evolutionary rather than revolutionary features, but still ads a lot of value.

The primary feature of 4.0 was, of course, ACID transactions.  In 4.2 MongoDB has improved transactions in a couple of significant ways.  First, transactions may now span database shards. Admittedly, the majority of MongoDB customers are not using sharding. However, for those that are using shards—some of MongoDB's biggest and most important customers—being able to span multiple shards is a significant improvement.   Additionally,  transactions can now exceed 16MB, which relieves a significant limitation of version 4.2.

MongoDB has supported a full-text search index for some time, but with the integration of Lucene in 4.2, full-text search has become far more powerful.  Searches can now be enhanced to use Boolean, fuzzy, and compound operators, and use intelligent relevance scoring. This feature is currently only available in the MongoDB Atlas cloud.

Client-side encryption is a feature that is implemented in certain MongoDB drivers. It encrypts selected fields before transmission to the server. As a result, the back-end server never sees the unencrypted data, reducing any chance of a server-side breach. The driver can be supplied with multiple private keys, each of which can be assigned to specific user identities or to a specific field.  In this way, very fine-grained control over the encryption of data can be implemented. A hacker who obtained a private key would only be able to decrypt data specific to that key, rather than to the entire database.  Keys can we stored in external key management services such as AWS.

The Atlas Data Lake allows data stored in external Amazon S3 buckets to be queried by a MongoDB database using standard database query commands.  This feature is roughly equivalent to the “external tables” feature that has long been part of SQL databases such as Oracle. The capability has a couple of clear use cases. Users who are trying to reduce their MongoDB cloud storage costs can offload their data into S3, which has a far lower cost per GB. Other users who may wish to analyze data in files stored in S3 can now use MongoDB commands rather than having to write many lines of procedural language code.    

The MongoDB Charts feature—a visualization tool for MongoDB data—has now moved from Beta to General Availability. The MongoDB Kubernetes operator is also GA. The Kubernetes operator allows MongoDB to be deployed using the Kubernetes container orchestration system.

Wildcard indexing allows an index to be created against all the attributes in a sub-document. Previously, it would have been necessary to create individual indexes for each attribute, and you needed to know in advance what attributes would be present.  Wildcard indexes dynamically create indexes across new attributes as they are created.  Wildcard indexes can improve query performance in many cases, but can also add overhead if used thoughtlessly. 

MongoDB also announced some useful though incremental changes to aggregation allowing output from aggregations to be appended to existing collections and allowing aggregation pipelines to be embedded into update statements. 

MongoDB 4.2 may seem like a grab bag of features, but all of the features represent useful additions to your MongoDB toolkit. Some features—the Atlas Data Lake, for instance—need significant enhancements to cover all conceivable use cases. Nevertheless, MongoDB 4.2 will be a useful upgrade and I’d expect it to be widely deployed.

It’s worth noting that some newest features in MongoDB 4.2 are only available on the cloud-based MongoDB Atlas platform—more evidence that MongoDB sees its future in the cloud.