Sponsored Content: MongoDB’s Drive to Multi-Document Transactions

MongoDB Atlas is a cloud-hosted MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we've learned from optimizing thousands of deployments across startups and the Fortune 100.

Transactions are important. Any database needs to offer transactional guarantees to enforce data integrity. But they don't do all do it in the same way-different database technologies take different approaches:

  • Relational databases model an entity’s data across multiple rows and parent-child tables, and so transactions need to span those rows and tables.
  • With subdocuments and arrays, document databases allow related data to be unified hierarchically inside a single data structure. The document can be updated with an atomic operation, giving it the same data integrity guarantees as a multi-table transaction in a relational database. 

Because of this fundamental difference in data modeling, MongoDB’s existing atomicity guarantees are able to meet the data integrity needs of most applications. In fact, we estimate 80%-90% of applications don’t need multi-document transactions at all. However, there are some legitimate use cases and workloads where transactions across multiple documents are needed. In those cases, without transactions, a developer would have to implement complex logic on their own in the application layer. Also, some developers and DBAs have been conditioned by 40 years of relational data modeling to assume multi-table/document transactions are a requirement for any database, irrespective of the data model they are built upon. Others are concerned that while multi-document transactions aren’t needed by their apps today, they might be in the future and they don’t want to outgrow their database.

And so, the addition of multi-document ACID transactions makes it easier than ever for developers to address a complete range of use-cases on MongoDB.

As one can imagine, multi-document transactions are a much more complex thing to build in a distributed database than in a monolithic, scale-up database. In fact, we have been working on bringing multi-document transactions to MongoDB as part of a massive multi-year engineering investment. We have made enhancements to practically every part of the system – the storage layer itself, our replication consensus protocol, sharding architecture, consistency and durability guarantees, the introduction of a global logical clock, and refactored cluster metadata management and more. And we’ve exposed all of these enhancements through APIs that are fully consumable by our drivers.

The figure below represents the evolution of these enhancements as well as the work in progress to enable multi-document transactions. As you can see, we are nearly done.

In MongoDB 4.0, coming in summer 2018*, multi-document transactions will work across a replica set. We will extend support for transactions across a sharded deployment in the following release.

Importantly, the green boxes highlight all of the critical dependencies to transactions that have already been delivered over the past 3 years. And, frankly, that was the hardest part of the project – how to balance building the stepping stones we needed to get to transactions with delivering useful features to our users straightaway to improve their development experience along this journey. Wherever we could, we built components that suited both goals. For example, the introduction of the global logical clock and timestamps in the storage layer enforces consistent time across every operation in a distributed cluster. These enhancements are needed for transactions in order to provide snapshot isolation, but they also allowed us to implement change stream resumability and causal consistency in MongoDB 3.6, which are immediately valuable on their own. Change streams enable developers to build reactive applications that can view, filter, and act on data changes as they occur in the database in real-time, and recover from transient failures. Causal consistency allows developers to maintain the benefits of strong data consistency with “read your own write” guarantees, while taking advantage of scalability and availability of our intelligent distributed data platform.

The global logical clock is just one example. A selection of other key enhancements along the way illustrates how our engineering team deliberately laid the groundwork for transactions in such a way that we consistently surfaced additional benefits to our users:

  • The acquisition of WiredTiger Inc. and integration of its storage engine way back in MongoDB 3.0 brought massive scalability gains with document level concurrency control and compression to MongoDB. And with MVCC support, it also provided the storage layer foundations for transactions coming in MongoDB 4.0.
  • In MongoDB 3.2, the enhanced consensus protocol allowed for faster and more deterministic recovery from the failure or network partition of the primary replica set member, along with stricter durability guarantees for writes. These enhancements were immediately useful to MongoDB users then, and they are also essential capabilities for transactions.
  • The introduction of readConcern in 3.2 allowed applications to specify read isolation level on a per operation basis, providing powerful and granular consistency controls.
  • Logical sessions in MongoDB 3.6 gave our users causal consistency and retryable writes, but as a foundation for transactions, they provide MongoDB the ability to coordinate client and server operations across the nodes of a distributed cluster, managing the execution context for each statement in a transaction.
  • Similarly, retryable writes, implemented in MongoDB 3.6, simplify the development of applications in the face of elections (or other transient failures) while the server enforces at most once processing semantics.
  • Replica set point in time reads in 4.0 are essential for transactional consistency, but it’s also highly valuable to regular read operations that don’t need to be executed in a transaction. With this feature, reads will only show a view of the data that is consistent at the point the find() operation starts, irrespective of which replica serves the read, or what data has been modified by concurrent operations.

The number of remaining pieces on the roadmap to transactions is small. Once complete, multi-document distributed transactions will provide a globally consistent view of data (both in replica set and sharded deployments) through snapshot isolation and maintain all-or-nothing guarantees in cases of node failures. This will greatly simplify your application code. After all, MongoDB’s job is to take hard problems and solve them for as many developers as possible, so that you can focus on adding value to your applications and not dealing with the underlying plumbing.

We’re really excited about the release of multi-document transactions, and what they will allow you to build with MongoDB going forward. You should view our multi-document transactions page to learn more, and we invite you to sign up for the beta program so that you can start to put all of the work we’ve done through its paces.

MongoDB Atlas is a cloud-hosted MongoDB service engineered and run by the same team that builds the database. It incorporates operational best practices we've learned from optimizing thousands of deployments across startups and the Fortune 100.