MongoDB Change Stream

MongoDB is often found within a complete application stack that combines multiple technologies to deliver functionality to users both within and outside of the organization. Very often, organizations want to create automated workflows that propagate information automatically between disparate technologies. 

For instance, the creation of an order record within MongoDB might trigger an alert to a stockroom employee to fill the order. Operations staff might be monitoring the status of orders in real-time on various management dashboards. This information might also flow into a data warehousing system aggregating information from multiple databases, some of which are based on other database technologies such as Oracle or Postgres. 

While it's possible to implement these sorts of integrations using application code, it can be more reliable and efficient to have changes made in the database automatically trigger the desired integration workflow. In many databases, code stored in the database—database "triggers"—can be invoked whenever a change to nominated tables or collections occurs. An alternative approach allows an external application to "listen" to a stream of changes that is published when data is modified. MongoDB change streams represent the latter option.

MongoDB change streams are implemented on top of the MongoDB replication system. When changes are made to data in the MongoDB primary instance, a record of those changes is written to an internal collection called the "oplog." MongoDB secondary instances read the oplog and apply the changes to maintain synchronization with the primary. 

In a way, the MongoDB change stream is an API layered on top of this Oplog data. Programs can register an interest in changes for a specific collection, database, or deployment. The data is "streamed" to the application as it occurs, allowing the application to respond in real time to database changes.  

The application can specify a "pipeline" that can transform or filter changes before they are read. For instance, a pipeline might request that only changes to high-priority orders are to be streamed or could add application-specific tags to the data returned. 

By default, the change stream emits the operation to be performed on the document rather than the document itself. For insert operations, this is, of course, the complete new document. But for updates, it represents just the changes to be performed on the document--we only see the values for the changed attributes, not the entire document. Prior to MongoDB 6.0, there was a limited capability to see the entire document, though this full document image would be based on the current state of the document, not necessarily the state corresponding to the change stream event. However, in MongoDB 6.0, we have the ability to retrieve the full document both before and after the update event.

MongoDB change streams form the basis for various MongoDB integration products and utilities. For instance, the MongoDB Kafka connector works by listening to a MongoDB change stream and propagating the data from that stream to Kafka. Google and MongoDB recently released dataflow templates that allow MongoDB change stream data to be forwarded to Google Cloud Pub/Sub topics and from there to Google BigQuery.

MongoDB change streams are an important feature that allow MongoDB to integrate more easily with downstream systems. They are relatively easy to use and are an important part of the MongoDB toolset.