The MongoDB 4.0 release introduced multi-document transactions to the popular open source NoSQL DB. The lack of a transactional capability has been a key limiting factor in MongoDB uptake, so it’s not surprising that the company, it’s users and the technology press have been enthusiastic about this latest release.
There’s no doubt that transactions represent a major leap forward for MongoDB. While ACID transactions placed limitations on scalability that eventually broke the hegemony of the RDBMS model, the lack of a transactional capability in MongoDB limited the range of applications to which the database could aspire. Many users of MongoDB were forced to implement complex logic to work around these limitations. With the advent of transactions, MongoDB developers can develop systems that have strict requirements on consistency between document. Because transactional behaviour is still optional, these new capabilities do not need to have a negative effect on throughput, scalability or availability.
However, it’s worth remembering that the transactional capabilities of MongoDB have been commonplace for many decades. Oracle introduced transactional capabilities in 1984 and had a transactional implementation matching the current MongoDB standard by 1988. So while MongoDB transactions are welcome, they are hardly revolutionary.
Furthermore, the initial implementation of MongoDB transactions is limited in capability when compared to the familiar relational systems. Those adopting MongoDB transactions need to be aware of these limitations.
Firstly, MongoDB transactions can exist only for relatively short time periods. By default, a transaction must span no more than one minute of clock time. This limitation results from the underlying MongoDB implementation. Almost all transactional databases implement the Multi-Version Consistency Control (MVCC), model. In MVCC, there are multiple versions of each data item, and queries will always read a version that existed when the query commenced. In this way, queries see a consistent view of the data, unaffected by transactions that are issued during query execution.
MongoDB uses MVCC, but unlike databases such as Oracle, the “older” versions of data are kept only in memory. In other databases, these older versions can be persisted to disk. In order to avoid MongoDB memory filling up with these older versions of data, transaction times are limited so that the older versions need represent only a few minutes of changes.
MongoDB transactions are transmitted to other nodes in the cluster using the normal replication mechanisms. What this means is that each transaction is represented as a single “oplog” message. Oplog messages are limited to the MongoDB document size of 16MB. As a result, a transaction may change no more than 16MB of data. So large bulk updates or inserts cannot be processed in a single transaction.
Finally, MongoDB implements a locking mechanism somewhat different to that which RDBMS programmers would be used to. In a typical RDBMS transaction an attempt to modify a document which is currently changed by an uncommitted transaction would result in a blocking lock. The process attempting to modify the item in question would wait until the transaction completed before being able to proceed. The database typically implements a lock tree structure that ensures that sessions blocking on a lock are processed in order.
MongoDB does lock a document that is being modified by a transaction. However, other sessions that attempt to modify that document do not block. Rather, their transaction is aborted, and they are required to retry the transaction. This is potentially wasteful since other operations in the transaction will need to be re-executed, and also results in requests being serviced in a non-deterministic order.
MongoDB transactions are indeed a big step forward for the MongoDB database. But they do have limitations that are bound to cause issues with early adopters. We will expect the next few releases of MongoDB to address some of these concerns.